Bug 51878

Summary: Add WebKit1 API to view and delete local storage
Product: WebKit Reporter: Anton D'Auria <adauria>
Component: WebKit APIAssignee: Anton D'Auria <adauria>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, adele, ap, beidson, buildbot, bweinstein, commit-queue, ddkilzer, dglazkov, eric.carlson, eric, fishd, gustavo.noronha, gustavo, jberlin, joepeck, jorlow, kdecker, laszlo.gombos, levin, michaeln, sam, sfalken, webkit-ews, webkit.review.bot, xan.lopez
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 55172, 56303    
Attachments:
Description Flags
Added LocalStorageTracker to WebCore to track creation of LocalStorage dbs, and an API to list and delete them.
jorlow: review-
[PATCH] recovery from out-of-sync db, StorageTrackerClient
ddkilzer: review-, ddkilzer: commit-queue-
[PATCH] Testing against bots
none
[PATCH] with tests, for bots
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
[PATCH] protect m_client, always open tracker db on initialization, reorder lock taking
none
Patch levin: review-

Description Anton D'Auria 2011-01-04 10:16:14 PST
API should provide the following:

- get a list of origins that have local storage
- delete local storage for an origin
- delete all local storage
Comment 1 Jessie Berlin 2011-01-04 10:21:48 PST
<rdar://problem/8745985>
Comment 2 Anton D'Auria 2011-02-03 18:19:22 PST
Created attachment 81164 [details]
Added LocalStorageTracker to WebCore to track creation of LocalStorage dbs, and an API to list and delete them.

Need to implement StorageTrackerClient in DRT to run reliable tests. StorageAreaSync is not guaranteed to create a db (and a StorageTracker entry) before DRT JS queries StorageTracker. Since we can't add delays to layout tests, the next step is to implement StorageTrackerClient that DRT can implement.
Comment 3 WebKit Review Bot 2011-02-03 20:29:55 PST
Attachment 81164 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1

Source/WebCore/page/PageGroup.h:57:  The parameter name "origin" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebCore/storage/StorageTracker.cpp:30:  Found other header before a header this file implements. Should be: config.h, primary header, blank line, and then alphabetically sorted.  [build/include_order] [4]
Source/WebCore/storage/StorageTracker.cpp:31:  Found header this file implements after other header. Should be: config.h, primary header, blank line, and then alphabetically sorted.  [build/include_order] [4]
Source/WebCore/storage/StorageTracker.cpp:33:  Alphabetical sorting problem.  [build/include_order] [4]
Source/WebCore/storage/StorageTracker.cpp:35:  Alphabetical sorting problem.  [build/include_order] [4]
Source/WebCore/storage/StorageTracker.cpp:38:  Alphabetical sorting problem.  [build/include_order] [4]
Source/WebCore/storage/StorageTracker.cpp:116:  One line control clauses should not use braces.  [whitespace/braces] [4]
Source/WebCore/storage/StorageTracker.cpp:175:  Local variables should never be PassOwnPtr (see http://webkit.org/coding/RefPtr.html).  [readability/pass_ptr] [5]
Source/WebCore/storage/StorageTracker.cpp:427:  One line control clauses should not use braces.  [whitespace/braces] [4]
Source/WebCore/storage/StorageTrackerClient.h:35:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Source/WebCore/storage/StorageTracker.h:38:  Alphabetical sorting problem.  [build/include_order] [4]
Source/WebCore/storage/StorageTracker.h:40:  Alphabetical sorting problem.  [build/include_order] [4]
Source/WebCore/storage/StorageTracker.h:51:  Weird number of spaces at line-start.  Are you using a 4-space indent?  [whitespace/indent] [3]
Source/WebCore/storage/StorageTracker.h:51:  More than one command on the same line  [whitespace/newline] [4]
Source/WebCore/storage/StorageTracker.h:63:  The parameter name "origin" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebCore/storage/StorageTracker.h:83:  The parameter name "path" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebCore/storage/StorageTrackerClient.cpp:29:  Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted.  [build/include_order] [4]
Source/WebCore/storage/StorageNamespace.h:53:  The parameter name "origin" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebCore/storage/StorageNamespaceImpl.cpp:159:  One line control clauses should not use braces.  [whitespace/braces] [4]
Source/WebCore/storage/StorageNamespaceImpl.h:54:  The parameter name "origin" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 20 in 29 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 WebKit Review Bot 2011-02-03 21:00:22 PST
Attachment 81164 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7694725
Comment 5 Build Bot 2011-02-03 21:25:26 PST
Attachment 81164 [details] did not build on win:
Build output: http://queues.webkit.org/results/7698381
Comment 6 Gustavo Noronha (kov) 2011-02-03 21:44:04 PST
Attachment 81164 [details] did not build on gtk:
Build output: http://queues.webkit.org/results/7692757
Comment 7 Early Warning System Bot 2011-02-03 21:54:59 PST
Attachment 81164 [details] did not build on qt:
Build output: http://queues.webkit.org/results/7694758
Comment 8 Jeremy Orlow 2011-02-04 10:20:24 PST
Comment on attachment 81164 [details]
Added LocalStorageTracker to WebCore to track creation of LocalStorage dbs, and an API to list and delete them.

View in context: https://bugs.webkit.org/attachment.cgi?id=81164&action=review

My biggest question is why we need a tracker database.  I'm fundamentally against duplicating state since such duplicate state can get out of sync since the number of possible error conditions quickly become very difficult to test for and handle.  To be honest, I didn't read the tracker class in detail (yet) so maybe it's obvious from the code, but if so then there probably should have been something in the changeLog as well.  If there is a need, then I'd like to see tests for the various error conditions go in with this patch.  For example, what happens if there's an entry in the tracker, but the db doesn't exist.  Or vice versa.  Etc.

> Source/WebCore/page/PageGroup.cpp:122
> +

not sure this newline is useful

> Source/WebCore/page/PageGroup.cpp:135
> +

not sure this newline is useful

> Source/WebCore/page/PageGroup.cpp:141
> +}

Put a newline after this.  (It's odd you added one before the #endif later in the file, but didn't do one here. :-)

> Source/WebCore/storage/LocalStorageTask.h:51
> +        static PassOwnPtr<LocalStorageTask> createOriginIdentifiersImport() { return adoptPtr(new LocalStorageTask(ImportOrigins)); }

This method of doing tasks was always clunky.  Let's not add any in the old style and instead please use createCallbackTask (http://codesearch.google.com/codesearch?q=createCallbackTask&exact_package=chromium&hl=en&vert=chromium).  It'd be really nice if you'd be willing to convert the other ones while you're at it.

> Source/WebCore/storage/StorageAreaSync.cpp:142
> +void StorageAreaSync::scheduleCloseDatabase()

What's the difference between this and scheduleFinalSync?

> Source/WebCore/storage/StorageAreaSync.cpp:157
> +        disableSuddenTermination();

I really hate these...  <thinking out loud>I wonder if we could make an object that enables it on construction and disables on destruction and create/destroy it instead of manually incrementing and decrementing</..>

> Source/WebCore/storage/StorageNamespaceImpl.cpp:155
> +    // a sync and for StorageAreaSync to recreate the backing db file.

These semantics are not at all clear from the name.  Ideally, the name would better reflect this.  But at very least this comment should be in the header.

In what case are you clearing origins?  If it's because the user asked you to, I'm not sure this is the right policy.  Allowing live pages to resurrect all their data simply by touching a single piece of data seems a bit odd.

> Source/WebCore/storage/StorageTracker.cpp:3
> + *

use http://webkit.org/coding/bsd-license.html for new files (i.e. 2 clause)

> Source/WebCore/storage/StorageTracker.cpp:148
> +    SQLiteStatement statement(m_database, "SELECT origin FROM Origins");

Why do we need a table to track stuff?  Unless there's a _really_ good reason or we're unifying all the quotas together, I don't think we should ever add this.

> Source/WebCore/storage/StorageTrackerClient.h:40
> +        virtual void dispatchDidModifyOrigin(const String& originIdentifier) = 0;

Not sure the word dispatch adds information...is there precedence for it?
Comment 9 Anton D'Auria 2011-02-04 16:16:31 PST
(In reply to comment #8)
> (From update of attachment 81164 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=81164&action=review
> 
> My biggest question is why we need a tracker database.  I'm fundamentally against duplicating state since such duplicate state can get out of sync since the number of possible error conditions quickly become very difficult to test for and handle.  To be honest, I didn't read the tracker class in detail (yet) so maybe it's obvious from the code, but if so then there probably should have been something in the changeLog as well.  If there is a need, then I'd like to see tests for the various error conditions go in with this patch.  For example, what happens if there's an entry in the tracker, but the db doesn't exist.  Or vice versa.  Etc.

The equivalent state exists only on the file system. We should use a db because we'll have relational data, like per-origin quota, and to avoid hitting the file system to see what db files are in the local storage directory and parsing them for the database identifier. StorageNamespaceImpl duplicates this partially in a map of origins because it loads key/value pairs for origins that request it. StorageTracker is kept in sync with what origins actually exist whenever StorageAreaSync attempts to open an origin's db. If StorageTracker has an entry for a db that doesn't exist, then it falsely reports that origin as having local storage, and on StorageTracker::deleteOrigin() and deleteAllOrigins() it removes the db entries even if the file system call to delete files fails.

For the case when a db exists and StorageTracker doesn't have an entry for it, this patch's behavior is the same as DatabaseTracker -- it is not reported to the client. I would like to address this in a separate patch.

> > Source/WebCore/storage/LocalStorageTask.h:51
> > +        static PassOwnPtr<LocalStorageTask> createOriginIdentifiersImport() { return adoptPtr(new LocalStorageTask(ImportOrigins)); }
> 
> This method of doing tasks was always clunky.  Let's not add any in the old style and instead please use createCallbackTask (http://codesearch.google.com/codesearch?q=createCallbackTask&exact_package=chromium&hl=en&vert=chromium).  It'd be really nice if you'd be willing to convert the other ones while you're at it.

I'd like to make this change, but I would prefer to do it in a separate patch in which all other uses of LocalStorageTask unrelated to StorageTracker are removed.

> > Source/WebCore/storage/StorageAreaSync.cpp:142
> > +void StorageAreaSync::scheduleCloseDatabase()
> 
> What's the difference between this and scheduleFinalSync?
scheduleFinalSync destroys the StorageArea, which shouldn't happen if we're just deleting the backing db. It also doesn't close the database, which we must do before deleting the db. This also forces StorageAreaSync to reopen the db on next sync, which will recreate the file.

> > Source/WebCore/storage/StorageNamespaceImpl.cpp:155
> > void StorageNamespaceImpl::clearOriginForDeletion(SecurityOrigin* origin)
> > +    // a sync and for StorageAreaSync to recreate the backing db file.
> 
> These semantics are not at all clear from the name.  Ideally, the name would better reflect this.  But at very least this comment should be in the header.
> 
> In what case are you clearing origins?  If it's because the user asked you to, I'm not sure this is the right policy.  Allowing live pages to resurrect all their data simply by touching a single piece of data seems a bit odd.

There was a bug in the patch that didn't clear the backing db's data if a live page wrote to local storage and StorageAreaSync reopened the db before StorageTracker's background thread deleted the db (hence, you saw this as resurrecting the data). This is fixed in a forthcoming revision of this patch with a call to StorageAreaSync::scheduleClear() before scheduleCloseDatabase().

> > Source/WebCore/storage/StorageTrackerClient.h:40
> > +        virtual void dispatchDidModifyOrigin(const String& originIdentifier) = 0;
> 
> Not sure the word dispatch adds information...is there precedence for it?

Yes, in Source/WebCore/storage/DatabaseTrackerClient.h.
Comment 10 Jeremy Orlow 2011-02-04 16:59:30 PST
(In reply to comment #9)
> (In reply to comment #8)
> > (From update of attachment 81164 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=81164&action=review
> > 
> > My biggest question is why we need a tracker database.  I'm fundamentally against duplicating state since such duplicate state can get out of sync since the number of possible error conditions quickly become very difficult to test for and handle.  To be honest, I didn't read the tracker class in detail (yet) so maybe it's obvious from the code, but if so then there probably should have been something in the changeLog as well.  If there is a need, then I'd like to see tests for the various error conditions go in with this patch.  For example, what happens if there's an entry in the tracker, but the db doesn't exist.  Or vice versa.  Etc.
> 
> The equivalent state exists only on the file system. We should use a db because we'll have relational data, like per-origin quota, and to avoid hitting the file system to see what db files are in the local storage directory and parsing them for the database identifier.

Can you tell me what operations are inefficient because they're not in a central database?  Deleting an origin is easy: figure out the file name and delete it.  Deleting everything is easy: delete the whole dir.  What else are you hoping to do?

> StorageNamespaceImpl duplicates this partially in a map of origins because it loads key/value pairs for origins that request it. StorageTracker is kept in sync with what origins actually exist whenever StorageAreaSync attempts to open an origin's db. If StorageTracker has an entry for a db that doesn't exist, then it falsely reports that origin as having local storage, and on StorageTracker::deleteOrigin() and deleteAllOrigins() it removes the db entries even if the file system call to delete files fails.

If there is some reason why we really do need things in a relational database, then we should have tests for all of these cases.  

> For the case when a db exists and StorageTracker doesn't have an entry for it, this patch's behavior is the same as DatabaseTracker -- it is not reported to the client. I would like to address this in a separate patch.

As long as the tracker is inert until then.
 
> > > Source/WebCore/storage/LocalStorageTask.h:51
> > > +        static PassOwnPtr<LocalStorageTask> createOriginIdentifiersImport() { return adoptPtr(new LocalStorageTask(ImportOrigins)); }
> > 
> > This method of doing tasks was always clunky.  Let's not add any in the old style and instead please use createCallbackTask (http://codesearch.google.com/codesearch?q=createCallbackTask&exact_package=chromium&hl=en&vert=chromium).  It'd be really nice if you'd be willing to convert the other ones while you're at it.
> 
> I'd like to make this change, but I would prefer to do it in a separate patch in which all other uses of LocalStorageTask unrelated to StorageTracker are removed.

That's fine.  Ideally it'd be a patch landed before this, but it's not a huge deal either way.
 
> > > Source/WebCore/storage/StorageAreaSync.cpp:142
> > > +void StorageAreaSync::scheduleCloseDatabase()
> > 
> > What's the difference between this and scheduleFinalSync?
> scheduleFinalSync destroys the StorageArea, which shouldn't happen if we're just deleting the backing db. It also doesn't close the database, which we must do before deleting the db. This also forces StorageAreaSync to reopen the db on next sync, which will recreate the file.

If it's practical to do so, it'd be nice if you could combine the methods and just pass in an enum which dictates which behavior to use.  Since you said .clear() will be called right before then, the clear empty databases code should actually take care of closing it I believe.

Btw, I totally think what you're doing with making it safe to clear live databases is good.  And I agree with the logic being put into some sort of tracker (rather than being static on the namespace class).  But I feel strongly we shouldn't create a tracker database unless we absolutely need to.  And, if we do, we should make sure we properly test all scenarios where the DB is out of sync with the FS (due to both bugs in the code or various crashing scenarios).
Comment 11 Anton D'Auria 2011-02-07 11:13:07 PST
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > (From update of attachment 81164 [details] [details] [details])
> > > View in context: https://bugs.webkit.org/attachment.cgi?id=81164&action=review
> > > 
> > > My biggest question is why we need a tracker database.  I'm fundamentally against duplicating state since such duplicate state can get out of sync since the number of possible error conditions quickly become very difficult to test for and handle.  To be honest, I didn't read the tracker class in detail (yet) so maybe it's obvious from the code, but if so then there probably should have been something in the changeLog as well.  If there is a need, then I'd like to see tests for the various error conditions go in with this patch.  For example, what happens if there's an entry in the tracker, but the db doesn't exist.  Or vice versa.  Etc.
> > 
> > The equivalent state exists only on the file system. We should use a db because we'll have relational data, like per-origin quota, and to avoid hitting the file system to see what db files are in the local storage directory and parsing them for the database identifier.
> 
> Can you tell me what operations are inefficient because they're not in a central database?  Deleting an origin is easy: figure out the file name and delete it.  Deleting everything is easy: delete the whole dir.  What else are you hoping to do?

My main motivation for using a database is to add per-origin quotas. Deletion of an origin file and origins directory is easy, and the patch already does it as you describe. Getting a list of origins with local storage by parsing filenames is also not a big deal, and can be done once for the lifetime of the StorageTracker instance to make sure it is in sync. There is a precedent (perhaps a bad one) in DatabaseTracker to never sync up with the file system. It can lose track of a db file if the origin never requests access to it again.

> > > > Source/WebCore/storage/StorageAreaSync.cpp:142
> > > > +void StorageAreaSync::scheduleCloseDatabase()
> > > 
> > > What's the difference between this and scheduleFinalSync?
> > scheduleFinalSync destroys the StorageArea, which shouldn't happen if we're just deleting the backing db. It also doesn't close the database, which we must do before deleting the db. This also forces StorageAreaSync to reopen the db on next sync, which will recreate the file.
> 
> If it's practical to do so, it'd be nice if you could combine the methods and just pass in an enum which dictates which behavior to use.  Since you said .clear() will be called right before then, the clear empty databases code should actually take care of closing it I believe.

Empty databases are cleared only on final sync, which happens when all local storage is closed by PageGroup.

> Btw, I totally think what you're doing with making it safe to clear live databases is good.  And I agree with the logic being put into some sort of tracker (rather than being static on the namespace class).  But I feel strongly we shouldn't create a tracker database unless we absolutely need to.  And, if we do, we should make sure we properly test all scenarios where the DB is out of sync with the FS (due to both bugs in the code or various crashing scenarios).

If we're going to add per-origin quotas, a StorageTracker db seems like the cleanest approach.
Comment 12 Jeremy Orlow 2011-02-07 11:55:15 PST
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > (In reply to comment #8)
> > > > (From update of attachment 81164 [details] [details] [details] [details])
> > > > View in context: https://bugs.webkit.org/attachment.cgi?id=81164&action=review
> > > > 
> > > > My biggest question is why we need a tracker database.  I'm fundamentally against duplicating state since such duplicate state can get out of sync since the number of possible error conditions quickly become very difficult to test for and handle.  To be honest, I didn't read the tracker class in detail (yet) so maybe it's obvious from the code, but if so then there probably should have been something in the changeLog as well.  If there is a need, then I'd like to see tests for the various error conditions go in with this patch.  For example, what happens if there's an entry in the tracker, but the db doesn't exist.  Or vice versa.  Etc.
> > > 
> > > The equivalent state exists only on the file system. We should use a db because we'll have relational data, like per-origin quota, and to avoid hitting the file system to see what db files are in the local storage directory and parsing them for the database identifier.
> > 
> > Can you tell me what operations are inefficient because they're not in a central database?  Deleting an origin is easy: figure out the file name and delete it.  Deleting everything is easy: delete the whole dir.  What else are you hoping to do?
> 
> My main motivation for using a database is to add per-origin quotas.

There already are per-origin quotas.  They're embedded in the local storage database file itself.

Unless you mean unified quotas across all storage types?  In which case creating yet another tracker would seem to pull us away from that goal?

> Deletion of an origin file and origins directory is easy, and the patch already does it as you describe. Getting a list of origins with local storage by parsing filenames is also not a big deal, and can be done once for the lifetime of the StorageTracker instance to make sure it is in sync.

Agreed.  So unless "add per-origin quotas" implies unified across storage types, I still don't see a compelling reason for this.

> There is a precedent (perhaps a bad one) in DatabaseTracker to never sync up with the file system. It can lose track of a db file if the origin never requests access to it again.

Yeah, that's definitely not a precedent we should follow.
 
> > > > > Source/WebCore/storage/StorageAreaSync.cpp:142
> > > > > +void StorageAreaSync::scheduleCloseDatabase()
> > > > 
> > > > What's the difference between this and scheduleFinalSync?
> > > scheduleFinalSync destroys the StorageArea, which shouldn't happen if we're just deleting the backing db. It also doesn't close the database, which we must do before deleting the db. This also forces StorageAreaSync to reopen the db on next sync, which will recreate the file.
> > 
> > If it's practical to do so, it'd be nice if you could combine the methods and just pass in an enum which dictates which behavior to use.  Since you said .clear() will be called right before then, the clear empty databases code should actually take care of closing it I believe.
> 
> Empty databases are cleared only on final sync, which happens when all local storage is closed by PageGroup.

True.  It's just that the code is kind of spaghetti right now and this change will make it worse.  Can you please experiment with some ways to make it all cleaner and have fewer code paths?

For example, what if a "close the DB" call were made and (before it's serviced) the page group then shuts down.  Will we handle that and other corner cases gracefully?  From the current code, it's very difficult to say either way I think.
Comment 13 Jeremy Orlow 2011-02-07 20:12:08 PST
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > (In reply to comment #9)
> > > > (In reply to comment #8)
> > > > > (From update of attachment 81164 [details] [details] [details] [details] [details])
> > > > > View in context: https://bugs.webkit.org/attachment.cgi?id=81164&action=review
> > > > > 
> > > > > My biggest question is why we need a tracker database.  I'm fundamentally against duplicating state since such duplicate state can get out of sync since the number of possible error conditions quickly become very difficult to test for and handle.  To be honest, I didn't read the tracker class in detail (yet) so maybe it's obvious from the code, but if so then there probably should have been something in the changeLog as well.  If there is a need, then I'd like to see tests for the various error conditions go in with this patch.  For example, what happens if there's an entry in the tracker, but the db doesn't exist.  Or vice versa.  Etc.
> > > > 
> > > > The equivalent state exists only on the file system. We should use a db because we'll have relational data, like per-origin quota, and to avoid hitting the file system to see what db files are in the local storage directory and parsing them for the database identifier.
> > > 
> > > Can you tell me what operations are inefficient because they're not in a central database?  Deleting an origin is easy: figure out the file name and delete it.  Deleting everything is easy: delete the whole dir.  What else are you hoping to do?
> > 
> > My main motivation for using a database is to add per-origin quotas.
> 
> There already are per-origin quotas.  They're embedded in the local storage database file itself.
> 
> Unless you mean unified quotas across all storage types?  In which case creating yet another tracker would seem to pull us away from that goal?

Can you clarify whether this is what you meant?
Comment 14 Anton D'Auria 2011-02-07 23:26:31 PST
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > (In reply to comment #10)
> > > > (In reply to comment #9)
> > > > > (In reply to comment #8)
> > > > > > (From update of attachment 81164 [details] [details] [details] [details] [details] [details])
> > > > > > View in context: https://bugs.webkit.org/attachment.cgi?id=81164&action=review
> > > > > > 
> > > > > > My biggest question is why we need a tracker database.  I'm fundamentally against duplicating state since such duplicate state can get out of sync since the number of possible error conditions quickly become very difficult to test for and handle.  To be honest, I didn't read the tracker class in detail (yet) so maybe it's obvious from the code, but if so then there probably should have been something in the changeLog as well.  If there is a need, then I'd like to see tests for the various error conditions go in with this patch.  For example, what happens if there's an entry in the tracker, but the db doesn't exist.  Or vice versa.  Etc.
> > > > > 
> > > > > The equivalent state exists only on the file system. We should use a db because we'll have relational data, like per-origin quota, and to avoid hitting the file system to see what db files are in the local storage directory and parsing them for the database identifier.
> > > > 
> > > > Can you tell me what operations are inefficient because they're not in a central database?  Deleting an origin is easy: figure out the file name and delete it.  Deleting everything is easy: delete the whole dir.  What else are you hoping to do?
> > > 
> > > My main motivation for using a database is to add per-origin quotas.
> > 
> > There already are per-origin quotas.  They're embedded in the local storage database file itself.

Currently, there are no per-origin quotas in LocalStorage dbs, though this sounds like a good solution. StorageAreaImpl::openDatabase simply creates the db, and StorageMap::setItem enforces the fixed 5MB-per-origin quota by summing up lengths of key/value pairs.

> > Unless you mean unified quotas across all storage types?  In which case creating yet another tracker would seem to pull us away from that goal?
> 
> Can you clarify whether this is what you meant?

I meant LocalStorage quotas only, though a quota for an origin across all storage types will be a lot more useful. I couldn't find a bug for this after a quick search--is there one?
Comment 15 Jeremy Orlow 2011-02-07 23:35:14 PST
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > (In reply to comment #11)
> > > > (In reply to comment #10)
> > > > > (In reply to comment #9)
> > > > > > (In reply to comment #8)
> > > > > > > (From update of attachment 81164 [details] [details] [details] [details] [details] [details] [details])
> > > > > > > View in context: https://bugs.webkit.org/attachment.cgi?id=81164&action=review
> > > > > > > 
> > > > > > > My biggest question is why we need a tracker database.  I'm fundamentally against duplicating state since such duplicate state can get out of sync since the number of possible error conditions quickly become very difficult to test for and handle.  To be honest, I didn't read the tracker class in detail (yet) so maybe it's obvious from the code, but if so then there probably should have been something in the changeLog as well.  If there is a need, then I'd like to see tests for the various error conditions go in with this patch.  For example, what happens if there's an entry in the tracker, but the db doesn't exist.  Or vice versa.  Etc.
> > > > > > 
> > > > > > The equivalent state exists only on the file system. We should use a db because we'll have relational data, like per-origin quota, and to avoid hitting the file system to see what db files are in the local storage directory and parsing them for the database identifier.
> > > > > 
> > > > > Can you tell me what operations are inefficient because they're not in a central database?  Deleting an origin is easy: figure out the file name and delete it.  Deleting everything is easy: delete the whole dir.  What else are you hoping to do?
> > > > 
> > > > My main motivation for using a database is to add per-origin quotas.
> > > 
> > > There already are per-origin quotas.  They're embedded in the local storage database file itself.
> 
> Currently, there are no per-origin quotas in LocalStorage dbs, though this sounds like a good solution. StorageAreaImpl::openDatabase simply creates the db, and StorageMap::setItem enforces the fixed 5MB-per-origin quota by summing up lengths of key/value pairs.

Oh yeah...that's right.

Why do you say it sounds like a good solution though?  On first use, we read in all the items whether or not we're enforcing quota, so I'm not sure there's much benefit in saving it off.  (Especially since that would once again create data that can get out of sync.)
 
> > > Unless you mean unified quotas across all storage types?  In which case creating yet another tracker would seem to pull us away from that goal?
> > 
> > Can you clarify whether this is what you meant?
> 
> I meant LocalStorage quotas only, though a quota for an origin across all storage types will be a lot more useful. I couldn't find a bug for this after a quick search--is there one?

Not that I know of.  cc me if you create one

Just so you know tho: for Chromium, we want to discourage LocalStorage so we will _not_ want LocalStorage to be part of some unified quota (and we plan to decrease the existing limit in the near future as well).  But for all the other storage types, we think a unified quota will be great, and will probably work on it ourselves at some point if no one else beats us to it.
Comment 16 Anton D'Auria 2011-02-07 23:55:57 PST
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > (In reply to comment #12)
> > > > (In reply to comment #11)
> > > > > (In reply to comment #10)
> > > > > > (In reply to comment #9)
> > > > > > > (In reply to comment #8)
> > > > > > > > (From update of attachment 81164 [details] [details] [details] [details] [details] [details] [details] [details])
> > > > > > > > View in context: https://bugs.webkit.org/attachment.cgi?id=81164&action=review
> > > > > > > > 
> > > > > > > > My biggest question is why we need a tracker database.  I'm fundamentally against duplicating state since such duplicate state can get out of sync since the number of possible error conditions quickly become very difficult to test for and handle.  To be honest, I didn't read the tracker class in detail (yet) so maybe it's obvious from the code, but if so then there probably should have been something in the changeLog as well.  If there is a need, then I'd like to see tests for the various error conditions go in with this patch.  For example, what happens if there's an entry in the tracker, but the db doesn't exist.  Or vice versa.  Etc.
> > > > > > > 
> > > > > > > The equivalent state exists only on the file system. We should use a db because we'll have relational data, like per-origin quota, and to avoid hitting the file system to see what db files are in the local storage directory and parsing them for the database identifier.
> > > > > > 
> > > > > > Can you tell me what operations are inefficient because they're not in a central database?  Deleting an origin is easy: figure out the file name and delete it.  Deleting everything is easy: delete the whole dir.  What else are you hoping to do?
> > > > > 
> > > > > My main motivation for using a database is to add per-origin quotas.
> > > > 
> > > > There already are per-origin quotas.  They're embedded in the local storage database file itself.
> > 
> > Currently, there are no per-origin quotas in LocalStorage dbs, though this sounds like a good solution. StorageAreaImpl::openDatabase simply creates the db, and StorageMap::setItem enforces the fixed 5MB-per-origin quota by summing up lengths of key/value pairs.
> 
> Oh yeah...that's right.
> 
> Why do you say it sounds like a good solution though?  On first use, we read in all the items whether or not we're enforcing quota, so I'm not sure there's much benefit in saving it off.  (Especially since that would once again create data that can get out of sync.)

We can set a maximum size on the backing db itself, and handle insertion failure due to quotas the same way as is done by Databases. An existing db will not have more than the default 5MB. If a user wishes to lower the quota below current usage, he should probably delete local storage for the origin.

> 
> > > > Unless you mean unified quotas across all storage types?  In which case creating yet another tracker would seem to pull us away from that goal?
> > > 
> > > Can you clarify whether this is what you meant?
> > 
> > I meant LocalStorage quotas only, though a quota for an origin across all storage types will be a lot more useful. I couldn't find a bug for this after a quick search--is there one?
> 
> Not that I know of.  cc me if you create one

I created http://webkit.org/b/53983 - Implement unified per-origin quota tracker across all storage types.
Comment 17 Jeremy Orlow 2011-02-08 11:00:07 PST
(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #14)
> > > (In reply to comment #13)
> > > > (In reply to comment #12)
> > > > > (In reply to comment #11)
> > > > > > (In reply to comment #10)
> > > > > > > (In reply to comment #9)
> > > > > > > > (In reply to comment #8)
> > > > > > > > > (From update of attachment 81164 [details] [details] [details] [details] [details] [details] [details] [details] [details])
> > > > > > > > > View in context: https://bugs.webkit.org/attachment.cgi?id=81164&action=review
> > > > > > > > > 
> > > > > > > > > My biggest question is why we need a tracker database.  I'm fundamentally against duplicating state since such duplicate state can get out of sync since the number of possible error conditions quickly become very difficult to test for and handle.  To be honest, I didn't read the tracker class in detail (yet) so maybe it's obvious from the code, but if so then there probably should have been something in the changeLog as well.  If there is a need, then I'd like to see tests for the various error conditions go in with this patch.  For example, what happens if there's an entry in the tracker, but the db doesn't exist.  Or vice versa.  Etc.
> > > > > > > > 
> > > > > > > > The equivalent state exists only on the file system. We should use a db because we'll have relational data, like per-origin quota, and to avoid hitting the file system to see what db files are in the local storage directory and parsing them for the database identifier.
> > > > > > > 
> > > > > > > Can you tell me what operations are inefficient because they're not in a central database?  Deleting an origin is easy: figure out the file name and delete it.  Deleting everything is easy: delete the whole dir.  What else are you hoping to do?
> > > > > > 
> > > > > > My main motivation for using a database is to add per-origin quotas.
> > > > > 
> > > > > There already are per-origin quotas.  They're embedded in the local storage database file itself.
> > > 
> > > Currently, there are no per-origin quotas in LocalStorage dbs, though this sounds like a good solution. StorageAreaImpl::openDatabase simply creates the db, and StorageMap::setItem enforces the fixed 5MB-per-origin quota by summing up lengths of key/value pairs.
> > 
> > Oh yeah...that's right.
> > 
> > Why do you say it sounds like a good solution though?  On first use, we read in all the items whether or not we're enforcing quota, so I'm not sure there's much benefit in saving it off.  (Especially since that would once again create data that can get out of sync.)
> 
> We can set a maximum size on the backing db itself, and handle insertion failure due to quotas the same way as is done by Databases. An existing db will not have more than the default 5MB. If a user wishes to lower the quota below current usage, he should probably delete local storage for the origin.

Is there any reason that enforcing 5MB total vs. 5MB of user data is particularly important?  Or an important reason to make this enforce quota the same way WebSQLDatabase does?  To be honest, I see a lot of new code/risk for very little upside.  If there's some reason it's absolutely necessary for Safari, then maybe we can beat this patch into something everyone's happy with.  But if that's not the case, I think the tracker part of this patch should be scaled back into a refactoring of the static bits of StorageNamespace into its own class (possibly named Tracker..but not even sure about that).
 
> > 
> > > > > Unless you mean unified quotas across all storage types?  In which case creating yet another tracker would seem to pull us away from that goal?
> > > > 
> > > > Can you clarify whether this is what you meant?
> > > 
> > > I meant LocalStorage quotas only, though a quota for an origin across all storage types will be a lot more useful. I couldn't find a bug for this after a quick search--is there one?
> > 
> > Not that I know of.  cc me if you create one
> 
> I created http://webkit.org/b/53983 - Implement unified per-origin quota tracker across all storage types.

Thanks
Comment 18 Jeremy Orlow 2011-02-08 11:07:07 PST
Oh!!!  I finally just understood what you meant by "per-origin".  You're saying you want each origin to be able to have a different quota.  (I thought you meant the value is tracked on a per-origin basis.)

In that case, a tracker makes much more sense.  And adding a table to each database that keeps track of the allowed quota for that DB (and maybe also other metadata in the future?) also makes sense.  Or, if you're planning on having some UI that allows someone to see all the individual limits, then I guess it makes sense to create a new DB like you did (so you don't need to individually open all localStorage dbs)...but I still definitely want to see tests that hit the corner cases.

Sorry for my confusion.
Comment 19 Jessie Berlin 2011-02-23 14:23:07 PST
<rdar://problem/9044233>
Comment 20 Anton D'Auria 2011-02-23 18:15:53 PST
Created attachment 83594 [details]
[PATCH] recovery from out-of-sync db, StorageTrackerClient

- sync up storage tracker db with files on disk on startup
- implemented StorageTrackerClient for mac
- fix for a race condition in which a local storage db isn't cleared in time before StorageAreaSync writes new items and cancels the deletion

This is a bit tricky to test with layout tests because StorageTracker is async and doesn't generate DOM events. I'd like to land this patch with a disabled StorageTracker until I finish tests.
Comment 21 WebKit Review Bot 2011-02-23 18:22:03 PST
Attachment 83594 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7985317
Comment 22 Build Bot 2011-02-23 19:21:08 PST
Attachment 83594 [details] did not build on win:
Build output: http://queues.webkit.org/results/7986313
Comment 23 Joseph Pecoraro 2011-02-23 22:15:02 PST
Comment on attachment 83594 [details]
[PATCH] recovery from out-of-sync db, StorageTrackerClient

View in context: https://bugs.webkit.org/attachment.cgi?id=83594&action=review

Unfortunately all I really have are style comments and a few general comments.
I am not very familiar with local storage, so I can't do a full review of this. I could
give this a second pass and try to find more logical issues.

> Source/WebKit/mac/WebCoreSupport/WebSecurityOrigin.mm:64
> +- (NSString*)databaseIdentifier

Nit (style): star is on the right for Obj-C types => "NSString *"

> Source/WebKit/mac/WebCoreSupport/WebSecurityOriginPrivate.h:43
> +- (NSString*)databaseIdentifier;

ditto.

> Source/WebKit/mac/Storage/WebStorageManager.mm:25
> +

Nit: This file should have an ENABLE(DOM_STORAGE) guard.

> Source/WebKit/mac/Storage/WebStorageManager.mm:37
> +NSString *WebStorageDirectoryDefaultsKey = @"WebKitLocalStorageDatabasePathPreferenceKey";
> +NSString *WebStorageDidModifyOriginNotification = @"WebStorageDidModifyOriginNotification";

I see this is the same thing that mac/Storage/WebDatabaseManager.mm does. I don't know when we use NSUserDefaults over WebPreferences, but still I think we can improve these constants a bit. I think they should be marked const. like => "NSString * const ... = ". (looking at WebViewDidBeginEditingNotification from WebView.h/mm as my example).

> Source/WebKit/mac/Storage/WebStorageManager.mm:57
> +    for (unsigned i = 0; i < coreOrigins.size(); ++i) {

Nit: Use type "size_t" for i, because it is compared to a Vector::size() which is also a size_t. I recently learned that size_t can be 32bits or 64bits depending on the system and unsigned is always 32bits. So best to be consistent and use the same types in comparisons.

> Source/WebKit/mac/Storage/WebStorageTrackerClient.mm:35
> +#import "WebStorageManagerPrivate.h"
> +#import "WebSecurityOriginInternal.h"
> +#import <wtf/MainThread.h>
> +#import <wtf/RetainPtr.h>
> +#import <WebCore/SecurityOrigin.h>
> +#import <WebCore/PlatformString.h>

Nit (style): alphabetical order. (swap first two, swap last two, and "WebCore" < "wtf")

> Source/WebKit/mac/Storage/WebStorageManagerPrivate.h:27
> +extern NSString *WebStorageDirectoryDefaultsKey;
> +extern NSString *WebStorageDidModifyOriginNotification;

const.

> Source/WebKit/mac/WebView/WebView.mm:101
>  #import "WebSystemInterface.h"
> +#import "WebStorageManagerInternal.h"

Nit (style): alphabetical order.

> Source/WebCore/WebCore.exp.in:1085
> +__ZN7WebCore14StorageTracker17initializeTrackerERKN3WTF6StringE
> +__ZN7WebCore14StorageTracker16deleteAllOriginsEv
> +__ZN7WebCore14StorageTracker12deleteOriginEPNS_14SecurityOriginE
> +__ZN7WebCore14StorageTracker7trackerEv
> +__ZN7WebCore14StorageTracker7originsERN3WTF6VectorINS1_6RefPtrINS_14SecurityOriginEEELm0EEE
> +__ZN7WebCore14StorageTracker9setClientEPNS_20StorageTrackerClientE

If the StorageTracker is going to be guarded by ENABLE(DOM_STORAGE) then these symbols should be as well. Oddly, the DatabaseTracker symbols are not guarded. I guess no mac ports compile with DOM_STORAGE or DATABASES disabled.

> Source/WebCore/storage/StorageAreaSync.cpp:347
> -
> +    

Nit: This change can be removed.

> Source/WebCore/storage/StorageTrackerClient.cpp:1
> +/*

This file should be removed. Clients will be in Source/WebKit.

> Source/WebCore/storage/StorageTracker.cpp:103
> +    if (!SQLiteFileSystem::ensureDatabaseFileExists(databasePath, createIfDoesNotExist))
> +        return;

How about a LOG_ERROR here in case of a failure. For example, "Database Path %s does not exist or cannot be accessed".

> Source/WebCore/storage/StorageTracker.cpp:133
> +    if (m_thread)
> +        m_thread->scheduleTask(LocalStorageTask::createOriginIdentifiersImport());

The code above this asserts that m_thread exists. Is this check necessary, or is it just to be safe? I see you do this pattern a lot throughout this file.

> Source/WebCore/storage/StorageTracker.cpp:179
> +    for (Vector<String>::const_iterator path = paths.begin(); path != paths.end(); ++path) {
> +        if ((*path).endsWith(fileExt, true) && (*path).length() > fileExt.length()) {

It might be best to pull paths.end() out into a local variable so it doesn't get calculated each loop iteration. Same with storing *path into a String& local variable. And finally, maybe with fileExt.length() which could be another STATIC_LOCAL since it won't change.

> Source/WebCore/storage/StorageTracker.cpp:190
> +    for (OriginSet::const_iterator it = m_originSet->begin(); it != m_originSet->end(); ++it) {

ditto for m_originSet.end();

> Source/WebCore/storage/StorageTracker.cpp:415
> +    for (unsigned i = 0; it != end; ++it, ++i)
> +        m_originsBeingDeleted->add(*it);

It doesn't look like the counter `i` is needed here.

> Source/WebCore/storage/StorageTracker.cpp:444
> +void StorageTracker::originFilePaths(Vector<String>& paths)

I don't see where this function is used. Is it intended to be used later on?

> Source/WebCore/storage/StorageNamespaceImpl.cpp:35
> +#include "StorageTracker.h"
>  #include "StorageSyncManager.h"

Nit: alphabetical order.

> Source/WebCore/storage/DatabaseTrackerClient.h:41
> +

Nit: Unnecessary change.

> Source/WebCore/storage/StorageTracker.h:44
> +class LocalStorageThread;
> +class LocalStorageTask;
> +class SecurityOrigin;
> +
> +class StorageTrackerClient;    

Nit: alphabetical order. No blank line necessary.

> Source/WebCore/platform/mac/FileSystemMac.mm:88
> +Vector<String> listDirectory(const String& path, const String& filter)

Does returning a Vector<String> here cause a copy copy constructor to be run?
Can this instead return a PassRefPtr<Vector<String> >?

> Source/WebCore/WebCore.xcodeproj/project.pbxproj:852
> +		3AC3680112EF7A09006A3D6F /* StorageTrackerClient.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 3AC367FF12EF7A09006A3D6F /* StorageTrackerClient.cpp */; };

Nit: Remember, this file should be removed.
Comment 24 David Kilzer (:ddkilzer) 2011-02-24 09:17:28 PST
Comment on attachment 83594 [details]
[PATCH] recovery from out-of-sync db, StorageTrackerClient

r- to fix build failures (probably just need to add new files to other build system files) and to consider Joe's nitpicks (most of which should be addressed).
Comment 25 Collabora GTK+ EWS bot 2011-02-24 15:37:51 PST
Attachment 83594 [details] did not build on gtk:
Build output: http://queues.webkit.org/results/7981536
Comment 26 Anton D'Auria 2011-02-24 15:49:40 PST
Comment on attachment 83594 [details]
[PATCH] recovery from out-of-sync db, StorageTrackerClient

View in context: https://bugs.webkit.org/attachment.cgi?id=83594&action=review

>> Source/WebCore/storage/StorageTracker.cpp:103
>> +        return;
> 
> How about a LOG_ERROR here in case of a failure. For example, "Database Path %s does not exist or cannot be accessed".

If createIfDoesNotExist is false, then the failure is ok. Adding LOG_ERROR if createIfDoesNotExist is true.

>> Source/WebCore/storage/StorageTracker.cpp:133
>> +        m_thread->scheduleTask(LocalStorageTask::createOriginIdentifiersImport());
> 
> The code above this asserts that m_thread exists. Is this check necessary, or is it just to be safe? I see you do this pattern a lot throughout this file.

Yes, this is no longer necessary.

>> Source/WebCore/storage/StorageTracker.cpp:444
>> +void StorageTracker::originFilePaths(Vector<String>& paths)
> 
> I don't see where this function is used. Is it intended to be used later on?

This needed to be removed. Thanks.

>> Source/WebCore/platform/mac/FileSystemMac.mm:88
>> +Vector<String> listDirectory(const String& path, const String& filter)
> 
> Does returning a Vector<String> here cause a copy copy constructor to be run?
> Can this instead return a PassRefPtr<Vector<String> >?

I agree with you, but this is the FileSystem.h interface, and I'm reluctant to make this change because it is implemented by other platforms.
Comment 27 Jeremy Orlow 2011-02-24 18:12:38 PST
(In reply to comment #26)
> (From update of attachment 83594 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=83594&action=review
> 
> >> Source/WebCore/storage/StorageTracker.cpp:103
> >> +        return;
> > 
> > How about a LOG_ERROR here in case of a failure. For example, "Database Path %s does not exist or cannot be accessed".
> 
> If createIfDoesNotExist is false, then the failure is ok. Adding LOG_ERROR if createIfDoesNotExist is true.
> 
> >> Source/WebCore/storage/StorageTracker.cpp:133
> >> +        m_thread->scheduleTask(LocalStorageTask::createOriginIdentifiersImport());
> > 
> > The code above this asserts that m_thread exists. Is this check necessary, or is it just to be safe? I see you do this pattern a lot throughout this file.
> 
> Yes, this is no longer necessary.
> 
> >> Source/WebCore/storage/StorageTracker.cpp:444
> >> +void StorageTracker::originFilePaths(Vector<String>& paths)
> > 
> > I don't see where this function is used. Is it intended to be used later on?
> 
> This needed to be removed. Thanks.
> 
> >> Source/WebCore/platform/mac/FileSystemMac.mm:88
> >> +Vector<String> listDirectory(const String& path, const String& filter)
> > 
> > Does returning a Vector<String> here cause a copy copy constructor to be run?
> > Can this instead return a PassRefPtr<Vector<String> >?

Did you mean PassOwnPtr?
 
> I agree with you, but this is the FileSystem.h interface, and I'm reluctant to make this change because it is implemented by other platforms.

The other platforms' code is checked into Source/WebKit/<port name> so you might be able to change it.
Comment 28 Anton D'Auria 2011-02-24 20:20:52 PST
Created attachment 83765 [details]
[PATCH] Testing against bots

For the bots, but also contains changes suggested by JoePeck.
Comment 29 Anton D'Auria 2011-03-02 17:59:53 PST
Created attachment 84501 [details]
[PATCH] with tests, for bots

Implemented tests for mac, which use StorageTrackerClient notifications to call LayoutTestController::notifyDone().
Comment 30 Anton D'Auria 2011-03-02 19:23:39 PST
Created attachment 84511 [details]
Patch
Comment 31 Anton D'Auria 2011-03-02 22:38:42 PST
Created attachment 84520 [details]
Patch
Comment 32 WebKit Review Bot 2011-03-02 22:47:46 PST
Attachment 84520 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8077951
Comment 33 Joseph Pecoraro 2011-03-02 23:45:51 PST
Comment on attachment 84520 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=84520&action=review

Some drive by comments I noticed scanning through. Its unusual to have a test depend on a previous
test. It makes it impossible to run the test individually. Also, if just 1 test fails on a port
and they decide to skip it, there may then be repercussions on other tests.

> Tools/DumpRenderTree/StorageTrackerDelegate.h:33
> +- (void)logNotifications:(int)number controller:(LayoutTestController *)controller;

Nit: You nicely have numberOfNotificationsToLog as unsigned, but you take in an int here and assign it.
I'd suggest making them both int or unsigned.

> Tools/DumpRenderTree/StorageTrackerDelegate.h:35
> +- (void)setControllerToNotifyDone:(LayoutTestController *)controller;

Nit: LayoutTestController* like you have it in the instance variable declaration.
(Also change the method implementation).

> Tools/DumpRenderTree/StorageTrackerDelegate.mm:30
> +#import <WEbKit/WebSecurityOriginPrivate.h>

Nit: <WebKit/WebSecurityOriginPrivate.h>

> Tools/DumpRenderTree/qt/LayoutTestControllerQt.cpp:846
> +    // FIXME: Implement.

If I were one of these ports, I would find it helpful if there was more information here.
Maybe something like "Implement to test StorageTracker, which tracking LocalStorage
operations."
Comment 34 Maciej Stachowiak 2011-03-03 00:18:13 PST
Comment on attachment 84520 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=84520&action=review

I'm going to say r+ because all my comments were matters of style, not substance, and it looked like previous comments have been addressed. However, I think you should look at the style comments and consider addressing them. A new patch is not needed for most, but please do post a new patch if you decide to make a listDirectory() that uses POSIX calls, since code for that could use fresh review.

> Source/WebCore/page/PageGroup.cpp:140
> +void PageGroup::testForceLocalStorageSync()

What does this function do? The name is a little unclear.

OK, after seeing the comment below, I get the purpose, but I think it would help clarify to take "test" out of the name, even though the function is for testing. As it is, it sounds like it tests forcing something, but it just syncs local storage. In fact, just syncLocalStorage might be a good name, if the "force" does not add an important distinction.

> Source/WebCore/platform/mac/FileSystemMac.mm:106
> +Vector<String> listDirectory(const String& path, const String& filter)
> +{
> +    Vector<String> entries;
> +
> +    // This can run on a background thread, so must create an autorelease pool.
> +    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
> +
> +    NSDirectoryEnumerator *dir = [[NSFileManager defaultManager] enumeratorAtPath:path]; 
> +    NSString *file;
> +    while ((file = [dir nextObject])) {
> +        if (filter.isEmpty() || ([file rangeOfString:filter].location != NSNotFound))
> +            entries.append([path stringByAppendingPathComponent:file]);
> +    }
> +
> +    [pool release];
> +
> +    return entries;
> +}

I think it would be better to write this using POSIX calls and put it in FileSystemPOSIX. That's how most mac FileSystem calls are implemented; it makes the code more portable, and avoids the need for an autorelease pool.

> Source/WebCore/storage/LocalStorageTask.h:63
> +        LocalStorageTask(Type, const String&);
> +        LocalStorageTask(Type, const String&, const String&);

The String parameters here should probably have names in their declarations, since the names are not obvious from context.

> Source/WebCore/storage/StorageNamespaceImpl.cpp:168
> +void StorageNamespaceImpl::testForceSync()

Similar naming issue here - "test" is not being used as a verb, so this doesn't match WebKit naming style.

> Source/WebCore/storage/StorageTracker.h:100
> +    // FIXME: do we need an ownptr here?
> +    OwnPtr<OriginSet> m_originSet;

You're the one adding it - you should know if we need the OwnPtr! However, I think it would be fine to just have this data member be an OriginSet rather than an OwbPtr<OriginSet>. There's no obvious reason for the extra allocation and indirection.

> Source/WebCore/storage/StorageTracker.h:102
> +    OwnPtr<OriginSet> m_originsBeingDeleted;

Likewise, I think the PwnPtr here is not needed, and it can just be an OriginSet, not a smart pointer to one.

> Source/WebKit/chromium/src/StorageNamespaceProxy.cpp:97
> +    // FIXME: Implement.

Why a FIXME instead of notImplemented?

> Source/WebKit/chromium/src/StorageNamespaceProxy.cpp:102
> +    // FIXME: Implement.

Why a FIXME instead of notImplemented?

> Source/WebKit/chromium/src/StorageNamespaceProxy.cpp:107
> +    // FIXME: Implement for tests that require a sync to backing db.

Why a FIXME instead of notImplemented?

> Tools/DumpRenderTree/LayoutTestController.cpp:399
> +    // Has mac implementation

This comment seems unnecessary.

> Tools/DumpRenderTree/LayoutTestController.cpp:409
> +    // Has mac implementation

ditto

> Tools/DumpRenderTree/LayoutTestController.cpp:426
> +    // Has mac implementation

ditto

> Tools/DumpRenderTree/LayoutTestController.cpp:435
> +    // Has mac implementation

ditto
Comment 35 Jeremy Orlow 2011-03-03 01:39:56 PST
Comment on attachment 84520 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=84520&action=review

Please don't commit this without letting me take a look.  Overall, it definitely seems on track.  I have a couple major concerns though.

I still think migrating away from the manual tasks to the new template based task creation is better.  The more I think about it, the more I think it'd be best to do that in a patch that's a precursor to this, so when you're adding the new logic you're doing it in the right way.  But if you still feel strongly that you want to do it in a follow up, I guess that's OK.

Have you tried splitting this patch up at all?  It's really massive and there's a lot of subtle stuff going on.

The rest of my concerns are inline:

> Source/WebCore/storage/StorageAreaSync.cpp:371
> +    

please remove the added white space

> Source/WebCore/storage/StorageAreaSync.cpp:461
> +}        

ditto

> Source/WebCore/storage/StorageAreaSync.cpp:465
> +    syncTimerFired(&m_syncTimer);

This simply schedules a sync, but doesn't block on it finishing....does that work for your tests?  If so, then you should probably rename it as when I think "sync" I think it's synchronously syncing it to disk, not that it's scheduling work to be done asynchronously on another thread.

> Source/WebCore/storage/StorageNamespace.h:55
> +    virtual void testForceSync() = 0;

As I mentioned on IRC, I don't think this is a very good name (here and elsewhere).  Test is often used in contexts other than a function that should only be called in a test.

> Source/WebCore/storage/StorageNamespaceImpl.cpp:163
> +        RefPtr<StorageAreaImpl> storageArea = it->second;

Not sure it's really necessary to save this to a tmp variable...especially since you don't right below

> Source/WebCore/storage/StorageTracker.cpp:48
> +void StorageTracker::initializeTracker(const String& storagePath)

Am I reading this wrong, or is this essentially making it so there's one storage tracker (and associated path) for all page groups?  If so, that definitely needs to be fixed.  Is there a reason this is a singleton rather than hanging off a page group?

> Source/WebCore/storage/StorageTracker.cpp:369
> +    MutexLocker lockDatabase(m_databaseGuard);

Am I reading this wrong, or would this (and a few others) block the main thread while this IO is happening (if the main thread tries to acquire the lock)?

> Source/WebCore/storage/StorageTracker.cpp:464
> +void StorageTracker::testStorageSync()

Why so many names for basically the same concept?

>> Source/WebKit/chromium/src/StorageNamespaceProxy.cpp:97
>> +    // FIXME: Implement.
> 
> Why a FIXME instead of notImplemented?

Actually, please just put an ASSERT_NOT_REACHED() in these. I believe with your current code we can't hit it, but if we ever hit it it'd definitely be a logic error.

> LayoutTests/ChangeLog:8
> +        * storage/domstorage/localstorage/storagetracker: Added.

I'm glad you put these in a directory.  For all the platforms that don't implement the layout test controller interface, you'll need to add them to the skipped lists though, right?  If so, just add the whole directory to the list, rather than individual tests.  And be sure to leave a comment about why they're skipped.  (Note that Chromium uses platform/chromium/test_expectations.txt rather than a skip list)
Comment 36 Build Bot 2011-03-03 09:21:08 PST
Attachment 84520 [details] did not build on win:
Build output: http://queues.webkit.org/results/8083360
Comment 37 Build Bot 2011-03-03 10:16:55 PST
Attachment 84520 [details] did not build on win:
Build output: http://queues.webkit.org/results/8078677
Comment 38 Michael Nordman 2011-03-03 20:57:57 PST
Comment on attachment 84520 [details]
Patch

I've got some drive by comments.

First, why is a database needed for this? Can the list of origins using localStorage be determined by the set of directories present on disk?

View in context: https://bugs.webkit.org/attachment.cgi?id=84520&action=review

> Source/WebCore/storage/StorageAreaSync.cpp:246
> +    // reopening, so cancel possible deletion.

If all access to the localstorage databases was serialized on the same thread, racey circumstances like this, and others in this patch, would not be possible.

> Source/WebCore/storage/StorageTracker.cpp:127
> +        MutexLocker lockDatabase(m_databaseGuard);

Maybe m_databaseGuard needs a better name since it seems to guard access to more than the database?

> Source/WebCore/storage/StorageTracker.cpp:168
> +            m_client->dispatchDidModifyOrigin(*it);

Calling this 'client' interface on the background thread is highly questionable.

> Source/WebCore/storage/StorageTracker.cpp:221
> +        m_originSet->add(originIdentifier);

A non-thread-safe refcount is bumped here.

> Source/WebCore/storage/StorageTracker.cpp:351
> +    // StorageTracker db deletion.

This sounds racey. This maybe simpler and easier to maintain if all access to local storage resources on disk were serialized on a single background thread.

> Source/WebCore/storage/StorageTracker.cpp:419
> +        m_client->dispatchDidModifyOrigin(originIdentifier);

Calling this 'client' interface on a background thread is probably not a good interface. Calling out upon modification was not one of the stated goals of your patch.
- produce a list of origins using local storage
- the ability to delete them
Maybe the 'client' interface can be removed altogether.

The style of multi-threading used here is reminiscent of that seen in the Database system. I'd suggest you not model the interfaces or implementation off of that system as it has been very problematic from the beginning. So rather than have collections that are accessed on multiple threads and mutex's to gaurd them, send messages and have the messages carry all the data. And serialize everything having to do with touching 'localStorage' on disk on the same thread.

void getAllOriginsUsingLocalStorage(completionCallback)
// only called on the main thread
// posts a task to the background thread
// background thread produces such a list and sends a message to the forground thread with results
// foreground thread calls the callers callback

void deleteLocalStorageForOrigin(completionCallback)
// only called on the main thread
// ... etc

David Levin's CrossThreadCopier stuff which is integrated into the 'task' infracture takes care of making thread-safe copies of string data as messages are passed.

Does the public interface of StorageTracker need more than two methods to accomplish your stated goals?

> Source/WebCore/storage/StorageTracker.cpp:430
> +        m_originsBeingDeleted->add(*it);

A non-thread-safe refcount is bumped here when a 'copy' of the string is placed in the second collection.

> Source/WebCore/storage/StorageTracker.h:65
> +    // Sync to disk on background thread.

Some of these methods are not invoked on the background thread. There is code in at least one of the method bodies to handle being called on the main thread.

> Source/WebCore/storage/StorageTracker.h:97
> +    typedef HashSet<String> OriginSet;

WebCore Strings have a thread affinity. Under the covers the string impl employs non-thread-safe refcounting. Having these data structures accessed from multiple threads is dicey. It's not enough to serialize access to the collection, great care has to be taken to make deep copie on the way in and out while under the lock... it's far to easy to not see it... arranging for data structures to be accessed on a single thread helps.
Comment 39 Anton D'Auria 2011-03-10 01:18:12 PST
Created attachment 85296 [details]
Patch
Comment 40 Anton D'Auria 2011-03-10 01:19:28 PST
listDirectory implementation is now in FileSystemPOSIX.cpp.

A new mutex, m_originSetGuard, is used to protect the in-memory origin set so that UI can't block on m_databaseGuard.

Non-thread-safe String refcounting is done under locks.

StorageTracker is still a singleton. Both, StorageTracker and DatabaseTracker need to be PageGroup specific eventually, but this can be done at a later time. It is currently not an issue for the mac port, which is the only one that will use the StorageTracker API.

LocalStorageTask is still being used. I will remove it in a separate patch.
Comment 41 WebKit Review Bot 2011-03-10 01:25:02 PST
Attachment 85296 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8112977
Comment 42 Jeremy Orlow 2011-03-10 01:33:17 PST
(In reply to comment #40)
> listDirectory implementation is now in FileSystemPOSIX.cpp.
> 
> A new mutex, m_originSetGuard, is used to protect the in-memory origin set so that UI can't block on m_databaseGuard.
> 
> Non-thread-safe String refcounting is done under locks.

I haven't looked at the code yet, but I'm really concerned this is going to be fragile....  :-/

> StorageTracker is still a singleton. Both, StorageTracker and DatabaseTracker need to be PageGroup specific eventually, but this can be done at a later time. It is currently not an issue for the mac port, which is the only one that will use the StorageTracker API.

Is there any strong reason not to tie it to page groups?  If not, please add some ASSERT or something that will fire if one tries to use it with multiple page groups.

> LocalStorageTask is still being used. I will remove it in a separate patch.

What's the reason for not doing it first?  It seems like it wouldn't be too hard and would simplify this patch.
Comment 43 Build Bot 2011-03-10 01:50:51 PST
Attachment 85296 [details] did not build on win:
Build output: http://queues.webkit.org/results/8117800
Comment 44 Brady Eidson 2011-03-10 10:12:56 PST
(In reply to comment #42)
> (In reply to comment #40)
> > listDirectory implementation is now in FileSystemPOSIX.cpp.
> > 
> > A new mutex, m_originSetGuard, is used to protect the in-memory origin set so that UI can't block on m_databaseGuard.
> > 
> > Non-thread-safe String refcounting is done under locks.
> 
> I haven't looked at the code yet, but I'm really concerned this is going to be fragile....  :-/

I think this has been largely modeled on DatabaseTracker wrt to the approach for multithreading.  We had concerns of fragility there, too, but none have surfaced in practice.

We're working with this API being added similarly to how we've worked with the DatabaseTracker, and will closely watch issues that come up in use.

> > StorageTracker is still a singleton. Both, StorageTracker and DatabaseTracker need to be PageGroup specific eventually, but this can be done at a later time. It is currently not an issue for the mac port, which is the only one that will use the StorageTracker API.
> 
> Is there any strong reason not to tie it to page groups?  If not, please add some ASSERT or something that will fire if one tries to use it with multiple page groups.

There's not a strong reason to not tie it to page groups, but there's also not a strong reason for Anton to go out of his way *to* tie it to page groups now.  As he mentioned, database tracker is also not tied to page groups and we haven't had a need to do so yet.

It's an organic change to do so for both the localstorage and database trackers later if the need comes up.
Comment 45 Brady Eidson 2011-03-10 10:18:44 PST
I'm giving this a review right now, but you'll need to fix the build failures.
Comment 46 Jeremy Orlow 2011-03-10 10:58:50 PST
(In reply to comment #44)
> (In reply to comment #42)
> > (In reply to comment #40)
> > > listDirectory implementation is now in FileSystemPOSIX.cpp.
> > > 
> > > A new mutex, m_originSetGuard, is used to protect the in-memory origin set so that UI can't block on m_databaseGuard.
> > > 
> > > Non-thread-safe String refcounting is done under locks.
> > 
> > I haven't looked at the code yet, but I'm really concerned this is going to be fragile....  :-/
> 
> I think this has been largely modeled on DatabaseTracker wrt to the approach for multithreading.  We had concerns of fragility there, too, but none have surfaced in practice.

This isn't true.  There have been many races there.  And michaeln is currently working on a major overhaul due to two new issues that were recently discovered.  

> We're working with this API being added similarly to how we've worked with the DatabaseTracker, and will closely watch issues that come up in use.
> 
> > > StorageTracker is still a singleton. Both, StorageTracker and DatabaseTracker need to be PageGroup specific eventually, but this can be done at a later time. It is currently not an issue for the mac port, which is the only one that will use the StorageTracker API.
> > 
> > Is there any strong reason not to tie it to page groups?  If not, please add some ASSERT or something that will fire if one tries to use it with multiple page groups.
> 
> There's not a strong reason to not tie it to page groups, but there's also not a strong reason for Anton to go out of his way *to* tie it to page groups now.  As he mentioned, database tracker is also not tied to page groups and we haven't had a need to do so yet.

As far as I know, WebSQLDatabase isn't tied to PageGroups.  DOM Storage however is.  It'd be wrong to be inconsistent throughout.

I also can't think off the top of my head of why this would be a difficult change.

If different PageGroups having different localStorage namespaces is no longer useful, we should completely pull out the feature.  (I thought the Mac WebKit framework exposed them to apps though...?)
Comment 47 Brady Eidson 2011-03-10 11:17:24 PST
(In reply to comment #46)
> (In reply to comment #44)
> > (In reply to comment #42)
> > > (In reply to comment #40)
> > > > listDirectory implementation is now in FileSystemPOSIX.cpp.
> > > > 
> > > > A new mutex, m_originSetGuard, is used to protect the in-memory origin set so that UI can't block on m_databaseGuard.
> > > > 
> > > > Non-thread-safe String refcounting is done under locks.
> > > 
> > > I haven't looked at the code yet, but I'm really concerned this is going to be fragile....  :-/
> > 
> > I think this has been largely modeled on DatabaseTracker wrt to the approach for multithreading.  We had concerns of fragility there, too, but none have surfaced in practice.
> 
> This isn't true.  There have been many races there.  And michaeln is currently working on a major overhaul due to two new issues that were recently discovered.

Interesting.  We've used the API hooked up to DatabaseTracker in Safari and haven't seen any reports of issues.  Do you have webkit bugzillas referencing these issues and this work?
> 
> > We're working with this API being added similarly to how we've worked with the DatabaseTracker, and will closely watch issues that come up in use.
> > 
> > > > StorageTracker is still a singleton. Both, StorageTracker and DatabaseTracker need to be PageGroup specific eventually, but this can be done at a later time. It is currently not an issue for the mac port, which is the only one that will use the StorageTracker API.
> > > 
> > > Is there any strong reason not to tie it to page groups?  If not, please add some ASSERT or something that will fire if one tries to use it with multiple page groups.
> > 
> > There's not a strong reason to not tie it to page groups, but there's also not a strong reason for Anton to go out of his way *to* tie it to page groups now.  As he mentioned, database tracker is also not tied to page groups and we haven't had a need to do so yet.
> 
> As far as I know, WebSQLDatabase isn't tied to PageGroups.  DOM Storage however is.  It'd be wrong to be inconsistent throughout.

"PageGroups" are a WebCore concept, not a spec concept.  We (at Apple, working on WebKit1/WebKit2 APIs) have the exact same motivations to make SQL databases tied to Page Groups as we would LocalStorage...  Maybe you have some other motivations that haven't been shared here?

> 
> I also can't think off the top of my head of why this would be a difficult change.
> 
> If different PageGroups having different localStorage namespaces is no longer useful, we should completely pull out the feature.  (I thought the Mac WebKit framework exposed them to apps though...?)

The WebKit API has absolutely no concept of local storage.  This work is the first that would directly add such a concept.

The fact that local storage namespaces are hung off of a page group is nascent - an indication of a direction we thought we'd be heading.  It's unlikely it will ever get fleshed out in the WK1 API, but we're likely to pursue making more things firmly into PageGroup-level concepts while working on the WK2 API.  That is one of them, and SQL databases is another.
Comment 48 Alexey Proskuryakov 2011-03-10 11:18:14 PST
As a design principle, we should certainly strive to not have any shared data, and prefer message passing.
Comment 49 Brady Eidson 2011-03-10 11:20:34 PST
(In reply to comment #48)
> As a design principle, we should certainly strive to not have any shared data, and prefer message passing.

I don't disagree with this.
Comment 50 Jeremy Orlow 2011-03-10 11:31:02 PST
(In reply to comment #47)
> (In reply to comment #46)
> > (In reply to comment #44)
> > > (In reply to comment #42)
> > > > (In reply to comment #40)
> > > > > listDirectory implementation is now in FileSystemPOSIX.cpp.
> > > > > 
> > > > > A new mutex, m_originSetGuard, is used to protect the in-memory origin set so that UI can't block on m_databaseGuard.
> > > > > 
> > > > > Non-thread-safe String refcounting is done under locks.
> > > > 
> > > > I haven't looked at the code yet, but I'm really concerned this is going to be fragile....  :-/
> > > 
> > > I think this has been largely modeled on DatabaseTracker wrt to the approach for multithreading.  We had concerns of fragility there, too, but none have surfaced in practice.
> > 
> > This isn't true.  There have been many races there.  And michaeln is currently working on a major overhaul due to two new issues that were recently discovered.
> 
> Interesting.  We've used the API hooked up to DatabaseTracker in Safari and haven't seen any reports of issues.  Do you have webkit bugzillas referencing these issues and this work?

Not off the top of my head.  Michael might.  Most of the issues I know of date back over a year ago.  But there were a couple recent ones caught by thread sanitizer and some other V8 work.

> > > We're working with this API being added similarly to how we've worked with the DatabaseTracker, and will closely watch issues that come up in use.
> > > 
> > > > > StorageTracker is still a singleton. Both, StorageTracker and DatabaseTracker need to be PageGroup specific eventually, but this can be done at a later time. It is currently not an issue for the mac port, which is the only one that will use the StorageTracker API.
> > > > 
> > > > Is there any strong reason not to tie it to page groups?  If not, please add some ASSERT or something that will fire if one tries to use it with multiple page groups.
> > > 
> > > There's not a strong reason to not tie it to page groups, but there's also not a strong reason for Anton to go out of his way *to* tie it to page groups now.  As he mentioned, database tracker is also not tied to page groups and we haven't had a need to do so yet.
> > 
> > As far as I know, WebSQLDatabase isn't tied to PageGroups.  DOM Storage however is.  It'd be wrong to be inconsistent throughout.
> 
> "PageGroups" are a WebCore concept, not a spec concept.  We (at Apple, working on WebKit1/WebKit2 APIs) have the exact same motivations to make SQL databases tied to Page Groups as we would LocalStorage...  Maybe you have some other motivations that haven't been shared here?

I have no motivations other than keeping the code base clean and consistent.  I'm a bit surprised you assume otherwise.

> > I also can't think off the top of my head of why this would be a difficult change.

Is it a difficult change?  It seems like it's 10-20 lines of code different...?

> > If different PageGroups having different localStorage namespaces is no longer useful, we should completely pull out the feature.  (I thought the Mac WebKit framework exposed them to apps though...?)
> 
> The WebKit API has absolutely no concept of local storage.  This work is the first that would directly add such a concept.
> 
> The fact that local storage namespaces are hung off of a page group is nascent - an indication of a direction we thought we'd be heading.  It's unlikely it will ever get fleshed out in the WK1 API, but we're likely to pursue making more things firmly into PageGroup-level concepts while working on the WK2 API.  That is one of them, and SQL databases is another.

Gotcha.  So I guess the point is that the page group logic for DOM Storage maybe never should have been added to begin with (since it's effectively dead code currently).

I think the right answer is to either make LocalStorage a singleton (and rip out the per-pageGroup logic--it can be reverted when WK2 needs it) or to make the tracker pageGroup aware (assuming you guys plan on utilizing that logic in the near future).
Comment 51 Jeremy Orlow 2011-03-10 11:34:09 PST
(In reply to comment #50)
> (In reply to comment #47)
> > (In reply to comment #46)
> > > (In reply to comment #44)
> > > > (In reply to comment #42)
> > > > > (In reply to comment #40)
> > > > > > listDirectory implementation is now in FileSystemPOSIX.cpp.
> > > > > > 
> > > > > > A new mutex, m_originSetGuard, is used to protect the in-memory origin set so that UI can't block on m_databaseGuard.
> > > > > > 
> > > > > > Non-thread-safe String refcounting is done under locks.
> > > > > 
> > > > > I haven't looked at the code yet, but I'm really concerned this is going to be fragile....  :-/
> > > > 
> > > > I think this has been largely modeled on DatabaseTracker wrt to the approach for multithreading.  We had concerns of fragility there, too, but none have surfaced in practice.
> > > 
> > > This isn't true.  There have been many races there.  And michaeln is currently working on a major overhaul due to two new issues that were recently discovered.
> > 
> > Interesting.  We've used the API hooked up to DatabaseTracker in Safari and haven't seen any reports of issues.  Do you have webkit bugzillas referencing these issues and this work?
> 
> Not off the top of my head.  Michael might.  Most of the issues I know of date back over a year ago.  But there were a couple recent ones caught by thread sanitizer and some other V8 work.

Here's the most recent one: http://code.google.com/p/chromium/issues/detail?id=71026
Comment 52 Jeremy Orlow 2011-03-10 11:43:31 PST
> > LayoutTests/ChangeLog:8
> > +        * storage/domstorage/localstorage/storagetracker: Added.
> 
> I'm glad you put these in a directory.  For all the platforms that don't implement the layout test controller interface, you'll need to add them to the skipped lists though, right?  If so, just add the whole directory to the list, rather than individual tests.  And be sure to leave a comment about why they're skipped.  (Note that Chromium uses platform/chromium/test_expectations.txt rather than a skip list)

Ping on this...

Also, in the future, can you please reply to comments inline?

J
Comment 53 Brady Eidson 2011-03-10 11:45:35 PST
(In reply to comment #50)
> (In reply to comment #47)
> > (In reply to comment #46)
> > > (In reply to comment #44)
> > > > (In reply to comment #42)
> > > > > (In reply to comment #40)
> > > > > > listDirectory implementation is now in FileSystemPOSIX.cpp.
> > > > > > 
> > > > > > A new mutex, m_originSetGuard, is used to protect the in-memory origin set so that UI can't block on m_databaseGuard.
> > > > > > 
> > > > > > Non-thread-safe String refcounting is done under locks.
> > > > > 
> > > > > I haven't looked at the code yet, but I'm really concerned this is going to be fragile....  :-/
> > > > 
> > > > I think this has been largely modeled on DatabaseTracker wrt to the approach for multithreading.  We had concerns of fragility there, too, but none have surfaced in practice.
> > > 
> > > This isn't true.  There have been many races there.  And michaeln is currently working on a major overhaul due to two new issues that were recently discovered.
> > 
> > Interesting.  We've used the API hooked up to DatabaseTracker in Safari and haven't seen any reports of issues.  Do you have webkit bugzillas referencing these issues and this work?
> 
> Not off the top of my head.  Michael might.  Most of the issues I know of date back over a year ago.  But there were a couple recent ones caught by thread sanitizer and some other V8 work.
> 
> > > > We're working with this API being added similarly to how we've worked with the DatabaseTracker, and will closely watch issues that come up in use.
> > > > 
> > > > > > StorageTracker is still a singleton. Both, StorageTracker and DatabaseTracker need to be PageGroup specific eventually, but this can be done at a later time. It is currently not an issue for the mac port, which is the only one that will use the StorageTracker API.
> > > > > 
> > > > > Is there any strong reason not to tie it to page groups?  If not, please add some ASSERT or something that will fire if one tries to use it with multiple page groups.
> > > > 
> > > > There's not a strong reason to not tie it to page groups, but there's also not a strong reason for Anton to go out of his way *to* tie it to page groups now.  As he mentioned, database tracker is also not tied to page groups and we haven't had a need to do so yet.
> > > 
> > > As far as I know, WebSQLDatabase isn't tied to PageGroups.  DOM Storage however is.  It'd be wrong to be inconsistent throughout.
> > 
> > "PageGroups" are a WebCore concept, not a spec concept.  We (at Apple, working on WebKit1/WebKit2 APIs) have the exact same motivations to make SQL databases tied to Page Groups as we would LocalStorage...  Maybe you have some other motivations that haven't been shared here?
> 
> I have no motivations other than keeping the code base clean and consistent.  I'm a bit surprised you assume otherwise.

I didn't mean motivation as in "some grand conspiracy", I meant motivation as in "reasons you think this is particularly important right now"  ;)

Keeping the code base consistent is a good thing, but it's often difficult to do while one specific area is undergoing transition.  Anton has already indicated that this is a multi-step process and he has followup work to do.

> > > I also can't think off the top of my head of why this would be a difficult change.
> 
> Is it a difficult change?  It seems like it's 10-20 lines of code different...?

Perhaps not difficult, but this patch is already huge...  That would add yet-another-aspect to the patch which is actually unrelated to the change being made.

> > > If different PageGroups having different localStorage namespaces is no longer useful, we should completely pull out the feature.  (I thought the Mac WebKit framework exposed them to apps though...?)
> > 
> > The WebKit API has absolutely no concept of local storage.  This work is the first that would directly add such a concept.
> > 
> > The fact that local storage namespaces are hung off of a page group is nascent - an indication of a direction we thought we'd be heading.  It's unlikely it will ever get fleshed out in the WK1 API, but we're likely to pursue making more things firmly into PageGroup-level concepts while working on the WK2 API.  That is one of them, and SQL databases is another.
> 
> Gotcha.  So I guess the point is that the page group logic for DOM Storage maybe never should have been added to begin with (since it's effectively dead code currently).
> 
> I think the right answer is to either make LocalStorage a singleton (and rip out the per-pageGroup logic--it can be reverted when WK2 needs it) or to make the tracker pageGroup aware (assuming you guys plan on utilizing that logic in the near future).

I think leaving LocalStorage a singleton in this patch, then either PageGroup-ifying DatabaseTracker *or* un-PageGroupify'ing LocalStorage in a followup is a fine thing to do.

> > > Interesting.  We've used the API hooked up to DatabaseTracker in Safari and haven't seen any reports of issues.  Do you have webkit bugzillas referencing these issues and this work?
> > 
> > Not off the top of my head.  Michael might.  Most of the issues I know of date back over a year ago.  But there were a couple recent ones caught by thread sanitizer and some other V8 work.
> 
> Here's the most recent one: http://code.google.com/p/chromium/issues/detail?id=71026

That bug appears to be about the SQL database thread mechanism itself, and not anything specific about DatabaseTracker, so I'm not sure that it's directly relevant to this discussion.
Comment 54 Jeremy Orlow 2011-03-10 11:46:31 PST
Comment on attachment 85296 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=85296&action=review

Few comments.  Will try to look over it more later...

> Source/WebCore/ChangeLog:48
> +        (WebCore::StorageAreaImpl::testForceSync):

This changelog seems a bit out of date

> Source/WebKit/ChangeLog:1
> +2011-03-10  Anton D'Auria  <adauria@apple.com>

get rid of the second one

> Source/WebKit/chromium/src/StorageNamespaceProxy.h:50
> +    virtual void testForceSync();

This needs to be removed.
Comment 55 Jeremy Orlow 2011-03-10 11:53:18 PST
(In reply to comment #53)
> (In reply to comment #50)
> > (In reply to comment #47)
> > > (In reply to comment #46)
> > > > (In reply to comment #44)
> > > > > (In reply to comment #42)
> > > > > > (In reply to comment #40)
> > > > > > > listDirectory implementation is now in FileSystemPOSIX.cpp.
> > > > > > > 
> > > > > > > A new mutex, m_originSetGuard, is used to protect the in-memory origin set so that UI can't block on m_databaseGuard.
> > > > > > > 
> > > > > > > Non-thread-safe String refcounting is done under locks.
> > > > > > 
> > > > > > I haven't looked at the code yet, but I'm really concerned this is going to be fragile....  :-/
> > > > > 
> > > > > I think this has been largely modeled on DatabaseTracker wrt to the approach for multithreading.  We had concerns of fragility there, too, but none have surfaced in practice.
> > > > 
> > > > This isn't true.  There have been many races there.  And michaeln is currently working on a major overhaul due to two new issues that were recently discovered.
> > > 
> > > Interesting.  We've used the API hooked up to DatabaseTracker in Safari and haven't seen any reports of issues.  Do you have webkit bugzillas referencing these issues and this work?
> > 
> > Not off the top of my head.  Michael might.  Most of the issues I know of date back over a year ago.  But there were a couple recent ones caught by thread sanitizer and some other V8 work.
> > 
> > > > > We're working with this API being added similarly to how we've worked with the DatabaseTracker, and will closely watch issues that come up in use.
> > > > > 
> > > > > > > StorageTracker is still a singleton. Both, StorageTracker and DatabaseTracker need to be PageGroup specific eventually, but this can be done at a later time. It is currently not an issue for the mac port, which is the only one that will use the StorageTracker API.
> > > > > > 
> > > > > > Is there any strong reason not to tie it to page groups?  If not, please add some ASSERT or something that will fire if one tries to use it with multiple page groups.
> > > > > 
> > > > > There's not a strong reason to not tie it to page groups, but there's also not a strong reason for Anton to go out of his way *to* tie it to page groups now.  As he mentioned, database tracker is also not tied to page groups and we haven't had a need to do so yet.
> > > > 
> > > > As far as I know, WebSQLDatabase isn't tied to PageGroups.  DOM Storage however is.  It'd be wrong to be inconsistent throughout.
> > > 
> > > "PageGroups" are a WebCore concept, not a spec concept.  We (at Apple, working on WebKit1/WebKit2 APIs) have the exact same motivations to make SQL databases tied to Page Groups as we would LocalStorage...  Maybe you have some other motivations that haven't been shared here?
> > 
> > I have no motivations other than keeping the code base clean and consistent.  I'm a bit surprised you assume otherwise.
> 
> I didn't mean motivation as in "some grand conspiracy", I meant motivation as in "reasons you think this is particularly important right now"  ;)
> 
> Keeping the code base consistent is a good thing, but it's often difficult to do while one specific area is undergoing transition.  Anton has already indicated that this is a multi-step process and he has followup work to do.

The problem with follow up patches is how often they don't happen...even when people have the best intentions to do them.  (I think most contributors are guilty of this....me even.)

I understand all too well that it can be a pain to split up patches and do them essentially backwards (doing the stuff you originally set out to do last), but it really is the best way to keep the codebase clean and stable throughout a major refactoring.  But that's probably what should be done here.

> > > > I also can't think off the top of my head of why this would be a difficult change.
> > 
> > Is it a difficult change?  It seems like it's 10-20 lines of code different...?
> 
> Perhaps not difficult, but this patch is already huge...  That would add yet-another-aspect to the patch which is actually unrelated to the change being made.
> 
> > > > If different PageGroups having different localStorage namespaces is no longer useful, we should completely pull out the feature.  (I thought the Mac WebKit framework exposed them to apps though...?)
> > > 
> > > The WebKit API has absolutely no concept of local storage.  This work is the first that would directly add such a concept.
> > > 
> > > The fact that local storage namespaces are hung off of a page group is nascent - an indication of a direction we thought we'd be heading.  It's unlikely it will ever get fleshed out in the WK1 API, but we're likely to pursue making more things firmly into PageGroup-level concepts while working on the WK2 API.  That is one of them, and SQL databases is another.
> > 
> > Gotcha.  So I guess the point is that the page group logic for DOM Storage maybe never should have been added to begin with (since it's effectively dead code currently).
> > 
> > I think the right answer is to either make LocalStorage a singleton (and rip out the per-pageGroup logic--it can be reverted when WK2 needs it) or to make the tracker pageGroup aware (assuming you guys plan on utilizing that logic in the near future).
> 
> I think leaving LocalStorage a singleton in this patch, then either PageGroup-ifying DatabaseTracker *or* un-PageGroupify'ing LocalStorage in a followup is a fine thing to do.
> 
> > > > Interesting.  We've used the API hooked up to DatabaseTracker in Safari and haven't seen any reports of issues.  Do you have webkit bugzillas referencing these issues and this work?
> > > 
> > > Not off the top of my head.  Michael might.  Most of the issues I know of date back over a year ago.  But there were a couple recent ones caught by thread sanitizer and some other V8 work.
> > 
> > Here's the most recent one: http://code.google.com/p/chromium/issues/detail?id=71026
> 
> That bug appears to be about the SQL database thread mechanism itself, and not anything specific about DatabaseTracker, so I'm not sure that it's directly relevant to this discussion.

Well, I was talking about threading issues in general.  Databases use of threading in general is way overly complicated.  And LocalStorage's definitely bordered on it.  I think this patch is really pushing it over the point where we need to re-think our approach.

Let's try to make it more message-passing-esque.
Comment 56 Michael Nordman 2011-03-10 12:38:52 PST
earler, i said
> So rather than have collections that are accessed on multiple threads and mutex's to gaurd them,
> send messages and have the messages carry all the data. And serialize everything having to do with
> touching 'localStorage' on disk on the same thread.

ap said
> As a design principle, we should certainly strive to not have any shared data, and prefer message passing.

brady said
> I don't disagree with this.

jorlow said
> Let's try to make it more message-passing-esque.


Given what's been said, seems like this patch should be revamped to use message passing instead. I think it good make it simpler too.

About the database system, look at the revision history to see the sort of problems in there. Jorlow said i was looking into overhauling things to fix a recent bug, I punted on that. There's a lot of inertia in what's there and changing it would a big effort. Saying this is modeled off of the DatabaseTracker is scary. It would be good to not let a bunch of difficult-to-work-with code accrue in the localstorage system. With that in mind...

The WebKit api exposed in this patch provides synchronous accessors to list the origins using local storage and to delete that data. Can those interfaces be async instead? The SQLDatabaseTracker has similar sync api exposed in the webkit api, and that's a part of the 'inertia' mentioned above. The sync'ness of these operations imposes difficult constraints that would otherwise not be there for normal use.
Comment 57 Anton D'Auria 2011-03-10 15:01:01 PST
Created attachment 85392 [details]
Patch
Comment 58 Brady Eidson 2011-03-10 15:05:48 PST
Since there's general agreement that a message passing model would be better for the DatabaseTracker, I filed https://bugs.webkit.org/show_bug.cgi?id=56144 to track this task.  Since changing it has been admitted to be a big effort, it is reasonable to punt on it and we can visit that bug when someone has the spare cycles.

Maciej originally r+'ed an earlier version of this patch with some style comments.  The patch as it exists today is not substantially different from that one, and in fact has improved since then.

Jeremy overrode that r+ with an r- and some comments, and it seems like the substantive ones have been addressed.

It seems that what we're left with today is a working patch improved from where it was originally r+'ed.  While there are merited objections about the model used to implement it, that isn't normally something we use as rationalization for holding up a patch.

I think it would be valuable to land this working patch today.  This allows those of us interested in the functionality to use it for now.  That experience can help guide us with iterating on it in the future.

I'm heading out for a dentist appt right now but hope to review this newest patch when I get back.
Comment 59 Anton D'Auria 2011-03-10 15:09:34 PST
(In reply to comment #52)
> > > LayoutTests/ChangeLog:8
> > > +        * storage/domstorage/localstorage/storagetracker: Added.
> > 
> > I'm glad you put these in a directory.  For all the platforms that don't implement the layout test controller interface, you'll need to add them to the skipped lists though, right?  If so, just add the whole directory to the list, rather than individual tests.  And be sure to leave a comment about why they're skipped.  (Note that Chromium uses platform/chromium/test_expectations.txt rather than a skip list)

Added storagetracker to Chromium skip list.

(In reply to comment #42)
> (In reply to comment #40)
> > StorageTracker is still a singleton. Both, StorageTracker and DatabaseTracker need to be PageGroup specific eventually, but this can be done at a later time. It is currently not an issue for the mac port, which is the only one that will use the StorageTracker API.
> 
> Is there any strong reason not to tie it to page groups?  If not, please add some ASSERT or something that will fire if one tries to use it with multiple page groups.

Added a ASSERT for only one page group.
Comment 60 WebKit Review Bot 2011-03-10 15:12:07 PST
Attachment 85392 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8132049
Comment 61 Jeremy Orlow 2011-03-10 15:28:34 PST
(In reply to comment #58)
> Since there's general agreement that a message passing model would be better for the DatabaseTracker, I filed https://bugs.webkit.org/show_bug.cgi?id=56144 to track this task.  Since changing it has been admitted to be a big effort, it is reasonable to punt on it and we can visit that bug when someone has the spare cycles.
> 
> Maciej originally r+'ed an earlier version of this patch with some style comments.  The patch as it exists today is not substantially different from that one, and in fact has improved since then.
> 
> Jeremy overrode that r+ with an r- and some comments, and it seems like the substantive ones have been addressed.
> 
> It seems that what we're left with today is a working patch improved from where it was originally r+'ed.  While there are merited objections about the model used to implement it, that isn't normally something we use as rationalization for holding up a patch.
> 
> I think it would be valuable to land this working patch today.  This allows those of us interested in the functionality to use it for now.  That experience can help guide us with iterating on it in the future.
> 
> I'm heading out for a dentist appt right now but hope to review this newest patch when I get back.

If the tracker were opt-in, then I would agree with you.  But this code will be used by every port as is.  If it were re-worked so that things work as they do now unless a port turns it on, then I agree it'd make sense (and be safer) to put this in and iterate.

That is not the case though.  Several specific threading problems have been brought up in this thread from Michael and I taking quick looks at things.  The fact that this patch got a r+ with those in it doesn't mean that the patch is now super awesome, it means that the original r+ was perhaps too hasty.  I'm concerned that if this goes in, existing ports will be destabilized.

I will make an effort to properly review this as, but I think this is 1) too big to allow a proper and thorough review for so much subtle threading stuff and 2) is leaving too much as future work.  In addition, I think much of the future work would make this patch cleaner.

I really do empathize with Anton.  When I started working on DOM Storage I spent WAY more time trying to get people to review my code, breaking things into smaller patches, etc than I actually spent writing the code.  It's really frustrating, I know.  But in well established features like this, we need to go out of our way not to destabilize things (with subtle errors like threading issues!) rather than add new functionality.


If you want to put it in as is, please make it opt-in.  (i.e. if StorageTracker::initializeTracker is not called, then you don't hit any of the new code paths and/or put it behind a feature define.)
Comment 62 Brady Eidson 2011-03-10 16:07:16 PST
(In reply to comment #61)
> (In reply to comment #58)
> > 
> > I think it would be valuable to land this working patch today.  This allows those of us interested in the functionality to use it for now.  That experience can help guide us with iterating on it in the future.
> > 
> 
> If the tracker were opt-in, then I would agree with you.
> ...
> If you want to put it in as is, please make it opt-in.  (i.e. if StorageTracker::initializeTracker is not called, then you don't hit any of the new code paths and/or put it behind a feature define.)

This seems pretty trivial, and I think Anton is already working on it.
Comment 63 Anton D'Auria 2011-03-10 16:42:24 PST
Created attachment 85409 [details]
Patch
Comment 64 WebKit Review Bot 2011-03-10 16:50:03 PST
Attachment 85409 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8139080
Comment 65 Build Bot 2011-03-10 17:03:32 PST
Attachment 85409 [details] did not build on win:
Build output: http://queues.webkit.org/results/8141085
Comment 66 Build Bot 2011-03-10 17:06:49 PST
Attachment 85392 [details] did not build on win:
Build output: http://queues.webkit.org/results/8148072
Comment 67 Brady Eidson 2011-03-10 17:28:24 PST
Comment on attachment 85409 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=85409&action=review

Looks good, but the build breaks and the m_isActive guard I pointed out need to be fixed.

> Source/WebCore/storage/StorageAreaSync.cpp:459
>      if (!count) {
>          query.finalize();
>          m_database.close();
> -        String databaseFilename = m_syncManager->fullDatabaseFilename(m_databaseIdentifier);
> -        if (!SQLiteFileSystem::deleteDatabaseFile(databaseFilename))
> -            LOG_ERROR("Failed to delete database file %s\n", databaseFilename.utf8().data());
> +        StorageTracker::tracker().deleteOrigin(m_databaseIdentifier);

Should guard this with the m_isActive flag, also.

> Source/WebKit/ChangeLog:18
> +2011-03-10  Anton D'Auria  <adauria@apple.com>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Add WebKit1 API to view and delete local storage
> +        https://bugs.webkit.org/show_bug.cgi?id=51878
> +
> +        * WebKit.xcodeproj/project.pbxproj:
> +
> +2011-03-02  Anton D'Auria  <adauria@apple.com>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        https://bugs.webkit.org/show_bug.cgi?id=51878
> +        Add WebKit1 API to view and delete local storage
> +
> +        * WebKit.xcodeproj/project.pbxproj:
> +

Double ChangeLog entry
Comment 68 Anton D'Auria 2011-03-10 17:38:52 PST
Created attachment 85413 [details]
Patch
Comment 69 Michael Nordman 2011-03-10 17:40:07 PST
(In reply to comment #58)
> Since there's general agreement that a message passing model would be better for the DatabaseTracker, I filed https://bugs.webkit.org/show_bug.cgi?id=56144 to track this task.  Since changing it has been admitted to be a big effort, it is reasonable to punt on it and we can visit that bug when someone has the spare cycles.
> 
> Maciej originally r+'ed an earlier version of this patch with some style comments.  The patch as it exists today is not substantially different from that one, and in fact has improved since then.
> 
> Jeremy overrode that r+ with an r- and some comments, and it seems like the substantive ones have been addressed.
> 
> It seems that what we're left with today is a working patch improved from where it was originally r+'ed.  While there are merited objections about the model used to implement it, that isn't normally something we use as rationalization for holding up a patch.
> 
> I think it would be valuable to land this working patch today.  This allows those of us interested in the functionality to use it for now.  That experience can help guide us with iterating on it in the future.
> 
> I'm heading out for a dentist appt right now but hope to review this newest patch when I get back.

Is there a rush on this?

Once the current sync WebKit api goes in and external consumers are using it... that's hard to change (inertia).

If you're interested in landing the functionality w/o caring about the quality of the impl, maybe get the outward facing WebKit api into good shape (make it async so our hands are not tied) and put a butt simple impl behind that interface (enumerate directories in the filesystem and just call delete on them). The logic in here about detecting 'deletes' in flight and avoiding them is terribly racey stuff, and the locking logic is somewhat elaborate and likely buggy in subtle ways.

It may be better to start with something simpler that is more amenable to iterating on.
Comment 70 Brady Eidson 2011-03-10 17:47:29 PST
(In reply to comment #69)
> (In reply to comment #58)
> > Since there's general agreement that a message passing model would be better for the DatabaseTracker, I filed https://bugs.webkit.org/show_bug.cgi?id=56144 to track this task.  Since changing it has been admitted to be a big effort, it is reasonable to punt on it and we can visit that bug when someone has the spare cycles.
> > 
> > Maciej originally r+'ed an earlier version of this patch with some style comments.  The patch as it exists today is not substantially different from that one, and in fact has improved since then.
> > 
> > Jeremy overrode that r+ with an r- and some comments, and it seems like the substantive ones have been addressed.
> > 
> > It seems that what we're left with today is a working patch improved from where it was originally r+'ed.  While there are merited objections about the model used to implement it, that isn't normally something we use as rationalization for holding up a patch.
> > 
> > I think it would be valuable to land this working patch today.  This allows those of us interested in the functionality to use it for now.  That experience can help guide us with iterating on it in the future.
> > 
> > I'm heading out for a dentist appt right now but hope to review this newest patch when I get back.
> 
> Is there a rush on this?
> 
> Once the current sync WebKit api goes in and external consumers are using it... that's hard to change (inertia).
> 
> If you're interested in landing the functionality w/o caring about the quality of the impl, maybe get the outward facing WebKit api into good shape (make it async so our hands are not tied) and put a butt simple impl behind that interface (enumerate directories in the filesystem and just call delete on them). The logic in here about detecting 'deletes' in flight and avoiding them is terribly racey stuff, and the locking logic is somewhat elaborate and likely buggy in subtle ways.
> 

Much like the current sync SPI, the api is not actually "synchronous".  Notice all the API calls do is schedule a background thread task in the storage tracker.  The API calls themselves return immediately.  And things within WebCore are, in fact, quite asynchronous already.
Comment 71 Build Bot 2011-03-10 18:06:02 PST
Attachment 85413 [details] did not build on win:
Build output: http://queues.webkit.org/results/8149096
Comment 72 Anton D'Auria 2011-03-10 19:06:16 PST
Created attachment 85421 [details]
Patch
Comment 73 Brady Eidson 2011-03-10 19:27:53 PST
Comment on attachment 85421 [details]
Patch

This one looks good.  I'll check in again later tonight to see if the commit queue wanted to cooperate on it.
Comment 74 Build Bot 2011-03-10 19:31:06 PST
Attachment 85421 [details] did not build on win:
Build output: http://queues.webkit.org/results/8131118
Comment 75 Anton D'Auria 2011-03-10 20:02:03 PST
Created attachment 85431 [details]
Patch
Comment 76 Brady Eidson 2011-03-10 22:50:56 PST
Comment on attachment 85431 [details]
Patch

Thanks for the Windows build fix!
Comment 77 WebKit Commit Bot 2011-03-10 23:46:55 PST
Comment on attachment 85431 [details]
Patch

Rejecting attachment 85431 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-3', 'apply-..." exit_code: 2

Last 500 characters of output:
acker/storage-tracker-4-create-expected.txt
patching file LayoutTests/storage/domstorage/localstorage/storagetracker/storage-tracker-4-create.html
patching file LayoutTests/storage/domstorage/localstorage/storagetracker/storage-tracker-5-delete-one-expected.txt
patching file LayoutTests/storage/domstorage/localstorage/storagetracker/storage-tracker-5-delete-one.html

Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', u'--reviewer', u'Brady Eidson', u'--for..." exit_code: 1

Full output: http://queues.webkit.org/results/8141185
Comment 78 Brady Eidson 2011-03-11 11:04:17 PST
(In reply to comment #77)
> (From update of attachment 85431 [details])
> Rejecting attachment 85431 [details] from commit-queue.
> 
> Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-3', 'apply-..." exit_code: 2
> 
> Last 500 characters of output:
> acker/storage-tracker-4-create-expected.txt
> patching file LayoutTests/storage/domstorage/localstorage/storagetracker/storage-tracker-4-create.html
> patching file LayoutTests/storage/domstorage/localstorage/storagetracker/storage-tracker-5-delete-one-expected.txt
> patching file LayoutTests/storage/domstorage/localstorage/storagetracker/storage-tracker-5-delete-one.html
> 
> Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', u'--reviewer', u'Brady Eidson', u'--for..." exit_code: 1
> 
> Full output: http://queues.webkit.org/results/8141185

This output as to why the commit-queue failed to run is kinda useless.
Comment 79 Jeremy Orlow 2011-03-11 11:08:55 PST
Comment on attachment 85431 [details]
Patch

feel free to re r+ if I'm wrong, but it does NOT appear that this is opt-in at the moment as we agreed upon.  I see StorageTracker::tracker() calls in methods every port calls and that looks like it starts the tracker.
Comment 80 Eric Seidel (no email) 2011-03-11 11:14:49 PST
Yes, the svn-apply failure output is useless.  We have some layering trouble getting in the way of pretty logs there.
Comment 81 Jeremy Orlow 2011-03-11 11:20:17 PST
Comment on attachment 85431 [details]
Patch

I see...the isActive flag should take care of it.
Comment 82 Brady Eidson 2011-03-11 11:28:36 PST
The commit-queue is failing here, and I know some of the issues the EWS bots had been having were simply git lock file related.  I'm prepping to land this manually, after I get a clean tree up to date then get it applied there.
Comment 83 Brady Eidson 2011-03-11 15:27:01 PST
Landed in r80892.

As the commit was rolling, I noticed we only got the tests on the chromium skipped list, not the other non-mac ones.  Working with Anton to get those in place.
Comment 84 WebKit Review Bot 2011-03-11 15:39:10 PST
http://trac.webkit.org/changeset/80892 might have broken Qt Linux Release minimal and Qt Windows 32-bit Release
Comment 85 Anton D'Auria 2011-03-11 15:47:41 PST
Created attachment 85548 [details]
Patch
Comment 86 Brady Eidson 2011-03-11 15:50:01 PST
Working on fixing the build.
Comment 87 David Levin 2011-03-11 16:01:00 PST
Comment on attachment 85431 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=85431&action=review

I gave this a quick look and had a few comments. There are some minor nits here but also some bad bugs as well.

> Source/WebCore/storage/StorageTracker.cpp:63
> +        storageTracker->importOriginIdentifiers();

This call strikes me as odd since  m_isActive is false and importOriginIdentifiers does nothing in that case.

> Source/WebCore/storage/StorageTracker.cpp:88
> +    m_storageDirectoryPath = path.threadsafeCopy();

btw, everytime you use threadsafeCopy, you should see it as a big flashing warning sign of trouble and bad design smells.

> Source/WebCore/storage/StorageTracker.cpp:92
> +{

Need ASSERT(!m_databaseGuard.tryLock()); here.

> Source/WebCore/storage/StorageTracker.cpp:181
> +

MutexLocker lockDatabase(m_databaseGuard); due to use of m_storageDirectoryPath

Also what guards your DEFINE_STATIC_LOCAL variables (especially considering that they are String's and have unsafe ref counting)?

> Source/WebCore/storage/StorageTracker.cpp:207
> +    // Delete stale StorageTracker records.

MutexLocker lockOrigins(m_originSetGuard); ?

> Source/WebCore/storage/StorageTracker.cpp:233
> +    LocalStorageTask* task = LocalStorageTask::createSetOriginDetails(originIdentifier.threadsafeCopy(), databaseFile).leakPtr();

Ideally this would be a RefPtr<LocalStorageTask> to get the PassRefPtr from createSetOriginDetails.

Actually I suspect that the task is getting deleted right now in at least one code path and the callOnMainThread call below just fails. (It should do a leakRefPtr and then scheduleTask should deref it.) Is there a test to exercise the callOnMainThread code path?

> Source/WebCore/storage/StorageTracker.cpp:239
> +        callOnMainThread(scheduleTask, (void*)task);

Use C++ style casts.

> Source/WebCore/storage/StorageTracker.cpp:248
> +        StorageTracker::tracker().m_thread->scheduleTask((LocalStorageTask*)task);

Use C++ style casts.

> Source/WebCore/storage/StorageTracker.cpp:345
> +                continue;

Over indented.

> Source/WebCore/storage/StorageTracker.cpp:491
> +    MutexLocker lockDatabase(m_databaseGuard);

Inverted lock order.  This will give you deadlocks.

> Source/WebCore/storage/StorageTracker.cpp:497
> +{

How is this thread safe with respect to the other methods that use m_client in a threaded manner like syncDeleteOrigin?

> Source/WebCore/storage/StorageTracker.h:94
> +    Mutex m_databaseGuard;

It would be nice to have a comment about what the mutex is suppose to be guarding. (I think it is the two member variables following it.)

> Source/WebCore/storage/StorageTracker.h:103
> +    Mutex m_originSetGuard;

It would be nice to have a comment about what the mutex is suppose to be guarding. (I think it is the two member variables following it.)
Comment 88 Brady Eidson 2011-03-11 16:07:31 PST
Attempt Qt build fix landed in r80898
Comment 89 Anton D'Auria 2011-03-11 20:16:31 PST
Created attachment 85564 [details]
[PATCH] protect m_client, always open tracker db on initialization, reorder lock taking

Patch addresses some of David Levin's concerns about protecting m_client and a possible deadlock caused by the order in which locks are taken. Also fixes an issue in which origin dbs are not added to the tracker until an initialization that takes place after a local storage write.
Comment 90 WebKit Review Bot 2011-03-11 20:17:50 PST
Attachment 85564 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1

Source/WebCore/storage/StorageTracker.cpp:177:  One line control clauses should not use braces.  [whitespace/braces] [4]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 91 Anton D'Auria 2011-03-11 21:08:52 PST
Created attachment 85568 [details]
Patch

fix style error
Comment 92 WebKit Commit Bot 2011-03-11 21:38:32 PST
Comment on attachment 85548 [details]
Patch

Clearing flags on attachment: 85548

Committed r80927: <http://trac.webkit.org/changeset/80927>
Comment 93 David Levin 2011-03-13 03:40:36 PDT
Comment on attachment 85568 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=85568&action=review

You may want to consider a new bug for this new patch as it may get hard to sort out the comments on this patch from the rest (and there have already been a lot of comments in this bug).

These following comments were not addressed. It is good to do one of several things: 
1. Address them with changes in the code.
2. Address them with an explanation of why they are incorrect. Surprising, but I am at times. :)
3. Explain why/that you'll address them later.

> Source/WebCore/storage/StorageTracker.cpp:233
> +    LocalStorageTask* task = LocalStorageTask::createSetOriginDetails(originIdentifier.threadsafeCopy(), databaseFile).leakPtr();

Ideally this would be a RefPtr<LocalStorageTask> to get the PassRefPtr from createSetOriginDetails.

Actually I suspect that the task is getting deleted right now in at least one code path and the callOnMainThread call below just fails. (It should do a leakRefPtr and then scheduleTask should deref it.) 

Is there a test to exercise the callOnMainThread code path?

> Source/WebCore/storage/StorageTracker.cpp:63
> +        storageTracker->importOriginIdentifiers();

This call strikes me as odd since  m_isActive is false and importOriginIdentifiers does nothing in that case.

> Source/WebCore/storage/StorageTracker.cpp:181
Also what guards your DEFINE_STATIC_LOCAL variables?

Specifically, static initialization isn't threadsafe.

Here are my comments about the changes in this patch:

> Source/WebCore/storage/StorageTracker.cpp:147
> +        openTrackerDatabase(true);

Why is this change being done? (Ideally, there would be a comment for this function in the ChangeLog to tell me.)

> Source/WebCore/storage/StorageTracker.cpp:150
> +

Whitespace clean ups like this in general are frowned upon. They make these diffs harder to read without significant benefit.

> Source/WebCore/storage/StorageTracker.cpp:158
> +        

Please remove this misc whitespace change.

> Source/WebCore/storage/StorageTracker.cpp:199
> +        originSetCopy = m_originSet;

This is broken. Note that OriginSet is a  HashSet<String> so the copy will just copy strings (which are basically RefPtr<StringImpl> and StringImpl is RefCounted -- which is simple ++/-- ref counting, not threadsafe).

In other words, you'll have ref counted StringImpl escaping out of your lock and that isn't threadsafe.

> Source/WebCore/storage/StorageTracker.cpp:295
> +        MutexLocker lockClient(m_clientGuard);

You are doing the null check for m_client outside of the lock. It seems like it would be good to check for it being null after the mutex is taken as well.

> Source/WebCore/storage/StorageTracker.cpp:365
> +            MutexLocker lockClient(m_clientGuard);

Ditto.

> Source/WebCore/storage/StorageTracker.cpp:476
> +        MutexLocker lockClient(m_clientGuard);

Ditto.
Comment 94 Anton D'Auria 2011-03-13 21:11:04 PDT
(In reply to comment #93)
> (From update of attachment 85568 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=85568&action=review
> 
> You may want to consider a new bug for this new patch as it may get hard to sort out the comments on this patch from the rest (and there have already been a lot of comments in this bug).
> 
> These following comments were not addressed. It is good to do one of several things: 
> 1. Address them with changes in the code.
> 2. Address them with an explanation of why they are incorrect. Surprising, but I am at times. :)
> 3. Explain why/that you'll address them later.
> 
> > Source/WebCore/storage/StorageTracker.cpp:233
> > +    LocalStorageTask* task = LocalStorageTask::createSetOriginDetails(originIdentifier.threadsafeCopy(), databaseFile).leakPtr();
> 
> Ideally this would be a RefPtr<LocalStorageTask> to get the PassRefPtr from createSetOriginDetails.
> 
> Actually I suspect that the task is getting deleted right now in at least one code path and the callOnMainThread call below just fails. (It should do a leakRefPtr and then scheduleTask should deref it.) 
> 
> Is there a test to exercise the callOnMainThread code path?

As David and I discussed on irc, this code is correct, but not very readable.

Since the primary patch for this bug has been checked in, and to simplify work on David's followup comments, I opened a new bug and posted the next patch here: https://bugs.webkit.org/show_bug.cgi?id=56285