Bug 25376

Summary: Refactor localStorage code for use in multi-process browsers
Product: WebKit Reporter: Jeremy Orlow <jorlow>
Component: WebCore Misc.Assignee: Jeremy Orlow <jorlow>
Status: RESOLVED FIXED    
Severity: Normal CC: andersca, beidson, dglazkov, fishd, michaeln, playmobil, sam, yael, zwarich
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 26629, 26658, 26698, 26732    
Bug Blocks:    
Attachments:
Description Flags
The first patch.
none
Revision of the last patch
none
Patch 2 in this set
none
A _slight_ ref of patch 1
none
Patch 3 in this set
none
Patch 4 in this set
none
Patch 5 in this set
none
patch 1 - v3
none
patch 1 - v4
fishd: review-
patch 1 - v5
none
patch 1 - v6
fishd: review+
patch 2 - v2 none

Description Jeremy Orlow 2009-04-24 13:47:24 PDT
This is my first WebKit ever bug.  I'm not sure how to assign it to myself and I'm also not exactly sure who to cc.  Anyhow...

This bug is to track the first step in getting localStorage working in Chromium.  The high level plan was discussed offline with Brady, Maciej, Anders, and Sam.

The basic idea is to have a clear split between a "front end" and a "back end" of localStorage.  The two will call each other in such a way that a proxy layer can be added in between.  In multi-process browsers (like Chromium) the proxy layer will be used to marshal the data across its IPC mechanisms.

At first glance, there's not a huge amount of work necessary.  The Storage class is more or less a front end and LocalStorageArea is more or less a back end already.  The StorageEvent code isn't currently split up, but it shouldn't be hard to do.  It might make sense to rename the *StorageArea classes to *StorageBackend since this is what they'll be.

I'll continue to update this bug as I further scope the work.  I'm hoping to start writing code for this in the next week or two.

FYI: The Chromium side of things will be tracked here: http://crbug.com/4360
Comment 1 Jeremy Orlow 2009-05-14 11:04:46 PDT
Created a proposal for this here: http://docs.google.com/Doc?id=dhs4g97m_8cwths74m 
Comment 2 Jeremy Orlow 2009-06-03 17:48:45 PDT
Created attachment 30936 [details]
The first patch.

In preparation for combining SessionStorage and LocalStorage into a single StorageManager, move the syncing bits out of LocalStorage and into its own class.
  
The next step is to combine LocalStorageArea and SessionStorageArea into one.  Then SessionStorage and LocalStorage can be combined.

Note that combining them will cut down on code paths that must be proxied, eliminate virtual dispatch, and allow Chromium to (eventually) write historical sessionStorage to disk when memory is getting tight.

Also remove a couple bits of cruft including code for quotas which is unnecessary (since a meta-data db is unnecessary since you can just count bytes as you read the local storage databases into memory).
Comment 3 Jeremy Orlow 2009-06-04 18:24:37 PDT
Created attachment 30978 [details]
Revision of the last patch

Oops, a change I was making to 'contains' (which I've since undone) accidentally made it into the last patch.  I think this fixes it.

Would really appreciate if someone could take a look!
Comment 4 Michael Nordman 2009-06-04 19:25:39 PDT
I'm really not at all familiar with the localstore code, so I'm not in a great position to give you a real review. (Not that I have review rights anyway:)


28         count bytes as you read the local storage databases into memory

This implies to me the entire db is slurped into memory. Is that right?


65     : m_path(path.copy())

nit: is the call to .copy() really necessary


68     if (!path.isEmpty())
69         m_syncManager = StorageSyncManager::create(m_path);

nit: use the same (m_)path variable in the test and create call


115     ASSERT(origin);
116     if (!origin)
117         return String();

nit: which is it, required or not? One of them can probably go.


124     return pathByAppendingComponent(m_path, origin->databaseIdentifier() + ".localstorage");

Oh, a unique database file per origin. I see this was pre-existing code.
Do you know why there is one per origin instead of one for all? Just curious.


WebCore/storage/StorageSyncManager.cpp
line 80: extra whitespace here


Comment 5 Jeremy Orlow 2009-06-04 20:47:28 PDT
(In reply to comment #4)
> I'm really not at all familiar with the localstore code, so I'm not in a great
> position to give you a real review. (Not that I have review rights anyway:)
> 
> 
> 28         count bytes as you read the local storage databases into memory
> 
> This implies to me the entire db is slurped into memory. Is that right?

Yes.  This is how it's always been.

> 
> 65     : m_path(path.copy())
> 
> nit: is the call to .copy() really necessary


This is how the original code was, but I can take a look to see if it's OK to skip the .copy().
 
> 68     if (!path.isEmpty())
> 69         m_syncManager = StorageSyncManager::create(m_path);
> 
> nit: use the same (m_)path variable in the test and create call

Oops.  Quite correct.
 
> 115     ASSERT(origin);
> 116     if (!origin)
> 117         return String();
> 
> nit: which is it, required or not? One of them can probably go.

ASSERT will tell you there's a problem in debug mode, but WebKit will crash a few lines later without the if statement.  I don't know if there are actual cases where origin could be null or if this code was "just in case".  Something I would like to do in the future is investigate this stuff, but I don't think it's a priority.

> 124     return pathByAppendingComponent(m_path, origin->databaseIdentifier() +
> ".localstorage");
> 
> Oh, a unique database file per origin. I see this was pre-existing code.
> Do you know why there is one per origin instead of one for all? Just curious.

Because there's a database per origin.

> WebCore/storage/StorageSyncManager.cpp
> line 80: extra whitespace here

Noted.

These really aren't enough changes for me to roll another patch as is, but I'll do it before this gets submitted.  Thanks for reviewing, Michael.  
Comment 6 Jeremy Orlow 2009-06-05 15:37:55 PDT
Created attachment 31014 [details]
Patch 2 in this set

I'm trying to make these patches as small and self contained as possible.  I'll continue rolling them out while I wait on reviews for earlier ones.  I'm now using git with a branch for each one, so I should be able to keep all the layers straight.....but prompt reviews are obviously going to make this easier on everyone.

This is the second patch in the set.  It takes another step towards abstracting out the sync process.  In this case, it's getting rid of all the 'internalGetItem' (and associated calls) and instead replaces them with one virtual function call that'll block until the import is complete.

I'm pretty sure that the behavior for setItem, removeItem, and clear are NOT safe.  getItem and contains do have a small optimization that is legit, but I'm don't think it helps a whole lot in practice since, if it's not present right away, it'll block until the import is complete.

I've made a FIXME comment about possible optimizations.  I'd like to implement this after a couple more rounds of refactoring.

Unlike most svn patches, this needs to be applied with 'patch -p1' instead of 'patch -p0'.  I'll create a ChangeLog and finalize this patch once the first one is reviewed.  I'm putting this (and probably more soon) up mostly as a FYI.
Comment 7 Jeremy Orlow 2009-06-05 16:04:44 PDT
Created attachment 31016 [details]
A _slight_ ref of patch 1

A couple tiny style changes from the last patch.

This is also now generated from 'git diff' which means that StorageSyncManager.* shows up just as an add rather than a copy from LocalStorage.*.  This is significant because much of the code in this new file is actually unmodified...so the reviewers job is easier than it might otherwise appear.  :-)
Comment 8 Jeremy Orlow 2009-06-05 16:32:11 PDT
I figured it might help reviewers if I explained what I'm doing in excruciating detail.

High level overview of structure:

*DOMWindow code*
Calls LocalStorage/SessionStorage to get a storage area
Creates a Storage object and returns it to the javascript implementation

*LocalStorage/SessionStorage code*
Keeps track of *StorageArea objects associated with each origin
Handles part of the syncing issues

*StorageArea/LocalStorageArea/SessionStorageArea*
Uses StorageMap to keep track of the actual data
Dispatches events
Handles the rest of the syncing and concurrency issues

*StorageMap*
Keeps the string -> string map that is DOM Storage


Where I'm headed:

I want to reorganize everything by function.  I want to have the event dispatching code in one place.  I want the syncing code in one place.  And I want the data mapping in one place.  To do this, I intend to pull the factory part of LocalStorage/SessionStorage into a StorageManager.  I want to pull the syncing part of LocalStorage into a StorageSyncManager.  I want to pull LocalStorageArea syncing code either into that manager or (more likely) a StorageAreaSync class.  I want to move the event dispatching into a StorageEventDispatcher class.

This is important for setting up proxies across processes.  For the StorageArea, StorageManager, and their helpers, the "real" implementation will live in Chromium's browser process (the main, unsandboxed one).  For StorageEventDispatcher, the "real" implementation will live in the renderer process and its proxy will live in the browser process.  If this doesn't make sense, let me know and I'll try to flesh it out further (or update the design doc :-)


As of right now, here's what I have planned in terms of patches.  This will probably change, but hopefully it can give you a high level overview of where I'm going with this:

1: Pull LocalStorage sync code into its own class.
2: Clean up the LocalStorageCode for handling the initial import.
3: Pull the rest of the sync code out of localStorageArea.
4: Pull the event dispatching code out of *StorageArea.
5: Combine LocalStorageArea and SessionStorageArea.  Combine LocalStorage and SessionStorage into StorageManager.  All of the heavy lifting will have already been done in other patches.
6: Made the actual proxy split.


Once again, if you have questions PLEASE ASK!  I really want to keep moving forward on this, so I'll take whatever time necessary to explain what I'm doing.  :-)
Comment 9 Jeremy Orlow 2009-06-05 19:11:27 PDT
Created attachment 31024 [details]
Patch 3 in this set

Patch for item #3.  (Moving the rest of the syncing code out of LocalStorageArea.)

Once #1 and #2 are in, I'll make a ChangeLog for this one and add the new file to the visual studio project, make file, etc.
Comment 10 Jeremy Orlow 2009-06-09 01:12:11 PDT
Created attachment 31085 [details]
Patch 4 in this set

Combine LocalStorageArea and SessionStorageArea into StorageArea.  Next up, combine LocalStorage and SessionStorage into StorageManager.
Comment 11 Jeremy Orlow 2009-06-12 00:13:37 PDT
Created attachment 31193 [details]
Patch 5 in this set

Merge SessionStorage and LocalStorage into a single StorageNamespace class (since that's essentially what they both were to begin with).

The next step is to add a proxy for StorageNamespace and StorageArea.
Comment 12 Cameron Zwarich (cpst) 2009-06-15 23:38:17 PDT
Why do we need some complicated multiprocess sync code in all local storage code when there is no way to build the WebKit tree in a multiprocess form?
Comment 13 Jeremy Orlow 2009-06-16 05:16:44 PDT
Chromium is a WebKit based browser that makes use of a multi-process based architecture.  You can find more details here: http://dev.chromium.org/developers/design-documents/multi-process-architecture

Please take a look at the patches I've posted so far.  I think the architecture is actually simpler after them.  Stuff like syncing code is broken out into its own area.  What remained could then be combined and some virtual dispatch and tricks like having an "internalFunctionFoo" could be eliminated.

As for "some complicated multiprocess sync code," I actually haven't posted any multi-process specific code yet.  It will make things more complex, but it's important so that Chromium and the rest of WebKit can continue sharing as much code as possible.

A couple of us met with a bunch of Apple guys (all of them in the cc list except for Maciej) to talk through the general strategy before.  One of the things we talked about was moving in the direction where it's easier for others to use WebKit in a multi-process manner.  We've also discussed some of the details on the webkit-dev mailing list.  (I can dig up some of the thread names if that'd be helpful.)

I think the benefits of sharing code will outweigh the negatives of added complexity.  And I think the 5 patches I've posted so far will help reduce this added complexity.
Comment 14 Jeremy Orlow 2009-06-16 15:56:30 PDT
Created attachment 31389 [details]
patch 1 - v3

Synced the repo and generated the patch with svn-create-patch.  This should help the readability of StorageSyncManager.* (since it started as a copy of LocalStorage.*).
Comment 15 Jeremy Orlow 2009-06-16 17:44:03 PDT
Created attachment 31395 [details]
patch 1 - v4

Added ENABLE(DOM_STORAGE) guards around the new files.
Comment 16 Jeremy Orlow 2009-06-16 18:45:19 PDT
Note that I've generated newer versions of the other patches (after syncing with head and adding guards) but they're almost the same, so I'm not going to bother uploading them unless someone expresses any amount of interest.
Comment 17 Darin Fisher (:fishd, Google) 2009-06-17 22:09:51 PDT
Comment on attachment 31395 [details]
patch 1 - v4

This refactoring seems like a good idea to me.  Reading over this change,
I spotted some stylistic things you'll want to fix.  Marking R- on account
of those.


> Index: WebCore/storage/LocalStorage.cpp
...
>  LocalStorage::LocalStorage(const String& path)
>      : m_path(path.copy())
> +    , m_syncManager(0)

nit: no need for this line since RefPtr's default constructor is equivalent.
it is commonplace in webcore to exclude lines like this.


>  {
> -    // If the path is empty, we know we're never going to be using the thread for anything, so don't start it.
> -    // In the future, we might also want to consider removing it from the DOM in that case - <rdar://problem/5960470>
> -    if (path.isEmpty())
> -        return;

Can you comment on why it makes sense to delete this comment?


> @@ -92,54 +88,18 @@ PassRefPtr<StorageArea> LocalStorage::st
...
> -    storageArea = LocalStorageArea::create(origin, this);
> +    

nit: ^^^ looks like some extraneous white spaces on that new line.


> Index: WebCore/storage/LocalStorageArea.cpp
...
>  void LocalStorageArea::scheduleFinalSync()
>  {
> +    ASSERT(isMainThread());
> +    if (!m_syncManager)
> +        return;
> +    

nit: ^^^ looks like some extraneous white spaces on that new line.


> Index: WebCore/storage/StorageSyncManager.cpp
...
> +void StorageSyncManager::close()
...
> +    

nit: ^^^ looks like some extraneous white spaces on that new line.


> +bool StorageSyncManager::scheduleImport(PassRefPtr<LocalStorageArea> area)
>  {
>      ASSERT(isMainThread());
> -
> +    

nit: ^^^ looks like some extraneous white spaces on that new line.


> +void StorageSyncManager::scheduleSync(PassRefPtr<LocalStorageArea> area)
>  {
>      ASSERT(isMainThread());
> +    

nit: ^^^ looks like some extraneous white spaces on that new line.


>      if (m_thread)
>          m_thread->scheduleSync(area);
>  }

> Index: WebCore/storage/StorageSyncManager.h
...
> +    class StorageSyncManager: public ThreadSafeShared<StorageSyncManager> {

nit: need a space before the ":"
Comment 18 Jeremy Orlow 2009-06-18 12:54:07 PDT
Created attachment 31504 [details]
patch 1 - v5

(In reply to comment #17)
> >  {
> > -    // If the path is empty, we know we're never going to be using the thread for anything, so don't start it.
> > -    // In the future, we might also want to consider removing it from the DOM in that case - <rdar://problem/5960470>
> > -    if (path.isEmpty())
> > -        return;
> 
> Can you comment on why it makes sense to delete this comment?

The comment just doesn't really fit in anywhere now.  The patch check is gone since the manager now handles that stuff and the comment doesn't reflect the current thinking.  (https://bugs.webkit.org/show_bug.cgi?id=25894)

Once the re-factoring is done, it'll be easier to fix the behavior for WebKit while allowing browsers (with different philosophies) to opt out.  Doing so earlier will just make the diffs a lot more messy.  And given that this has been a problem since the beginning, I don't think it's a big deal to wait a bit on it.  :-)

> > @@ -92,54 +88,18 @@ PassRefPtr<StorageArea> LocalStorage::st
> ...
> > -    storageArea = LocalStorageArea::create(origin, this);
> > +    
> 
> nit: ^^^ looks like some extraneous white spaces on that new line.

The spacing is as it was originally...I just removed an extra tab it appears.  I think the spacing is proper as it is now.

> 
> 
> > Index: WebCore/storage/LocalStorageArea.cpp
> ...
> >  void LocalStorageArea::scheduleFinalSync()
> >  {
> > +    ASSERT(isMainThread());
> > +    if (!m_syncManager)
> > +        return;
> > +    
> 
> nit: ^^^ looks like some extraneous white spaces on that new line.

I guess I could remove it, but that code is logically grouped together.  It's doing sanity checks and deciding if it should bail completely...essentially testing assumptions which are made throughout the rest of the function.  I don't like how it reads when you remove the space, but I'll do it if you insist.

> 
> 
> > Index: WebCore/storage/StorageSyncManager.cpp
> ...
> > +void StorageSyncManager::close()
> ...
> > +    
> 
> nit: ^^^ looks like some extraneous white spaces on that new line.
> 
> 
> > +bool StorageSyncManager::scheduleImport(PassRefPtr<LocalStorageArea> area)
> >  {
> >      ASSERT(isMainThread());
> > -
> > +    
> 
> nit: ^^^ looks like some extraneous white spaces on that new line.
> 
> 
> > +void StorageSyncManager::scheduleSync(PassRefPtr<LocalStorageArea> area)
> >  {
> >      ASSERT(isMainThread());
> > +    
> 
> nit: ^^^ looks like some extraneous white spaces on that new line.

Same as above...I'm putting a space between where I'm testing assumptions and where the real code is.  This was done often in the code I started with, but I did add in a bunch of spaces to make it all consistent.

I went ahead and made all the changes you asked for because making progress is more important than these nits, but I think they've decreased readability and consistency.
Comment 19 Jeremy Orlow 2009-06-19 16:32:39 PDT
Created attachment 31571 [details]
patch 1 - v6

Realized I misinterpreted Darin's comments.  Added back in the new lines without the trailing spaces.

I'm pretty sure this is ready to +.  :-)
Comment 20 Brady Eidson 2009-06-22 14:56:25 PDT
Sorry I've been absent through this whole process - I'd like to take a thorough look before this lands, which I am doing right now.  I'll be more available from here on out!
Comment 21 Darin Fisher (:fishd, Google) 2009-06-22 15:03:31 PDT
> Sorry I've been absent through this whole process - I'd like to take a thorough
> look before this lands, which I am doing right now.  I'll be more available
> from here on out!

Looks like I just missed your message :(  I went ahead and committed this patch as http://trac.webkit.org/changeset/44959.

I'm happy to revert or revise as you suggest.
Comment 22 Jeremy Orlow 2009-06-22 15:13:29 PDT
If it's ok, I'd like to keep this in for now (so that I can prepare the next patch).  These patches are really affecting my ability to get LocalStorage working in Chromium, and that in turn is holding up other people/projects (like extensions).  I really did want you feedback before we started checking anything in, but I felt somewhat backed into a corner.

There still are plenty of other patches to fix any issues you might have with the first patch, and we can always back the first out if there's some major architectural flaw.

As I've mentioned (via private mail) I'm more than happy to come by Apple to talk these patches through in person, on IRC, or whatever.  I know in person isn't normal, but it'd probably be most efficient (and an excuse for me to ride my motorcycle :-).
Comment 23 Brady Eidson 2009-06-22 15:44:32 PDT
None of that was the issue - timing and availability was all.

Please keep me CC'ed on future patches going forward.
Comment 24 Jeremy Orlow 2009-06-22 15:46:53 PDT
Created attachment 31685 [details]
patch 2 - v2

Simplify the interaction between LocalStorageArea/SessionStorageArea and StorageArea by creating a "blockUntilImportComplete()" function rather than bouncing back and forth between the child and parent classes in a somewhat unintuitive manner.

This patch also paves the way for LocalStorageArea and SessionStorageArea being merged into one.  It's part of several in a set which are separating syncing (to disk) code from the rest of the implementation so that local storage and session storage's code can be unified.
Comment 25 Brady Eidson 2009-06-22 15:55:07 PDT
In general we try not to keep reusing the same back for a sequence of patches.  One bug -> one patch makes it possible to track changes.  Could we try opening a new bug for the next patch(es) in the sequence please?
Comment 26 Jeremy Orlow 2009-06-22 15:57:43 PDT
So f(In reply to comment #23)
> Please keep me CC'ed on future patches going forward. 

Pretty much all the work is happening in this bug, but I'll definitely keep that in mind in the future.

Here are the only bugs about LocalStorage that _might_ have been of interest:
https://bugs.webkit.org/show_bug.cgi?id=26356
https://bugs.webkit.org/show_bug.cgi?id=25970

These are even less related:
https://bugs.webkit.org/show_bug.cgi?id=26154
https://bugs.webkit.org/show_bug.cgi?id=26180

P.S. It'd be awesome if you can take a look at the next patch today.  (It's pretty straightforward.)
Comment 27 Dimitri Glazkov (Google) 2009-06-22 16:01:25 PDT
Jeremy, stop monopolizing bradee-oh's attention! :) We need him for this bug: https://bugs.webkit.org/show_bug.cgi?id=26054
Comment 28 Jeremy Orlow 2009-06-22 16:02:38 PDT
Moved discussion of the second patch to a new thread as requested: https://bugs.webkit.org/show_bug.cgi?id=26629

I'm leaving patches 3-5 here for reference (until I cut a "real" version of each).
Comment 29 Jeremy Orlow 2009-06-23 13:47:15 PDT
Adding the broken out bugs as dependencies.
Comment 30 Eric Seidel (no email) 2009-06-29 13:20:30 PDT
So is this bug still waiting to be landed?  Or should it be closed?