Bug 244058 - [GLib] New API should use UnifiedOriginStorageLevel::Basic
Summary: [GLib] New API should use UnifiedOriginStorageLevel::Basic
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Catanzaro
URL:
Keywords:
Depends on:
Blocks: GTK4
  Show dependency treegraph
 
Reported: 2022-08-17 14:02 PDT by Michael Catanzaro
Modified: 2023-02-16 09:28 PST (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Catanzaro 2022-08-17 14:02:55 PDT
In future API versions, where WebKitWebsiteDataManager's localstorage and IndexedDB directory configuration APIs are removed, we can change WebsiteDataStore::defaultShouldUseCustomStoragePaths to allow WPE and GTK to use the new general storage directory instead of the legacy custom localstorage and IndexedDB directories.
Comment 1 Carlos Garcia Campos 2023-02-01 01:04:28 PST
It was removed in https://commits.webkit.org/258591@main
Comment 2 Michael Catanzaro 2023-02-01 07:58:00 PST
I quickly skimmed through that commit.

 * Old WPE/GTK API has to keep using UnifiedOriginStorageLevel::None to support use of custom directories for different data types.
 * New WPE/GTK API no longer allows setting custom directories for every data type, and only allows configuring the base data and base cache directories. So it should use at least UnifiedOriginStorageLevel::Basic. This should partition everything except cache?
 * We certainly want to partition cache data too. Is that the difference between UnifiedOriginStorageLevel::Basic vs. UnifiedOriginStorageLevel::Standard? Need to look closer. Apple is now putting cache under their origin directory, commingling cache and data. This would not be desirable for us because we should maintain clear separation between cache and data. I think we want one origin storage directory for cache and one for data.
Comment 3 Sihui Liu 2023-02-01 10:21:34 PST
(In reply to Michael Catanzaro from comment #2)
> I quickly skimmed through that commit.
> 
>  * Old WPE/GTK API has to keep using UnifiedOriginStorageLevel::None to
> support use of custom directories for different data types.
>  * New WPE/GTK API no longer allows setting custom directories for every
> data type, and only allows configuring the base data and base cache
> directories. So it should use at least UnifiedOriginStorageLevel::Basic.
> This should partition everything except cache?

Technically UnifiedOriginStorageLevel only indicates whether to the "General" directory for data types.
In our previous implementation, the directory structure looks like:
- WebsiteData/
  - CacheStorage
    - [Hashed Partitioned Origin]/
      - [Origin File]
      - [Record]/

Each type has its own directory, and origin file if the the origin directory name is hashed.
We later introduced a General directory, and the disk layout becomes:
- WebsiteData/
  - [General]
    - [Hashed Partitioned Origin]
      - [Origin File]
      - CacheStorage
        - [Record]
      - IndexedDB
      - LocalStorage
With new layout, we could eliminate unnecessary origin files and directories (good for backup). And it will be easier to apply same policy to all types (e.g. when a new type is added, we can just add another [Type] directory under partitioned origin directory without considering partitioning), and perform storage management (quote computation, data deletion, etc).

The new layout involves data migration (from custom directory to new directory) and some other work (refactoring, new architecture, testing, etc), so we are doing this type by type. We might not use the general directory for all types in the end, since some types use one single file for all origins for performance, and some types have different partition strategy, e.g. cookies.

Basically UnifiedOriginStorageLevel indicates the stage we are right now:
UnifiedOriginStorageLevel::None - don't use general directory for existing types
UnifiedOriginStorageLevel::Basic - use it for IndexedDB, LocalStorage
UnifiedOriginStorageLevel::Standard - use it for IndexedDB, LocalStorage, and CacheStorage
We are working towards UnifiedOriginStorageLevel::Standard now.
(Note FileSystem is a new type added after General directory is introduced, so it will be under this directory regardless of the level).

>  * We certainly want to partition cache data too. Is that the difference
> between UnifiedOriginStorageLevel::Basic vs.
> UnifiedOriginStorageLevel::Standard? Need to look closer. Apple is now
> putting cache under their origin directory, commingling cache and data. This
> would not be desirable for us because we should maintain clear separation
> between cache and data. I think we want one origin storage directory for
> cache and one for data.

If an existing type is partitioned now, it will be partitioned and UnifiedOriginStorageLevel only decides where the data is. As shown in example above, CacheStorage data is partitioned no matter it is Basic or Standard.

CacheStorage is not "cache", as its lifetime should be controlled by web clients (https://w3c.github.io/ServiceWorker/#cache-lifetimes), like IndexedDB.

For the other real cache types (e.g. http cache), we currently have no plan to move them. It might make sense to have another [General Cache] directory in parallel with [General] if we are going to do this.
Comment 4 Michael Catanzaro 2023-02-01 15:26:42 PST
(In reply to Sihui Liu from comment #3)
> CacheStorage is not "cache", as its lifetime should be controlled by web
> clients (https://w3c.github.io/ServiceWorker/#cache-lifetimes), like
> IndexedDB.

OK, surprise! I suppose that makes sense, because it looks like we want this data to persist even if users delete their cache dirs. (Users will report bugs complaining that we've put CacheStorage in a data directory rather than a cache directory, but oh well.)

Unfortunately, WebsiteDataStoreConfiguration::initializePaths does not agree, nor does WebsiteDataStoreConfiguration::defaultCacheStorageDirectory. So we've got a migration problem.

(There's also WebsiteDataStore::defaultApplicationCacheDirectory which is classed as cache rather than data. But surely that should be removed at some point, as it's deprecated in favor of service workers?)
Comment 5 Sihui Liu 2023-02-01 16:06:33 PST
(In reply to Michael Catanzaro from comment #4)
> (In reply to Sihui Liu from comment #3)
> > CacheStorage is not "cache", as its lifetime should be controlled by web
> > clients (https://w3c.github.io/ServiceWorker/#cache-lifetimes), like
> > IndexedDB.
> 
> OK, surprise! I suppose that makes sense, because it looks like we want this
> data to persist even if users delete their cache dirs. (Users will report
> bugs complaining that we've put CacheStorage in a data directory rather than
> a cache directory, but oh well.)
> 
> Unfortunately, WebsiteDataStoreConfiguration::initializePaths does not
> agree, nor does WebsiteDataStoreConfiguration::defaultCacheStorageDirectory.
> So we've got a migration problem.

Yes, we treated CacheStorage as cache, and got report from web developers about CacheStorage data disappears (since cache can be evicted in certain conditions, but data will not), so we are going to fix this by migration to general directory.

> 
> (There's also WebsiteDataStore::defaultApplicationCacheDirectory which is
> classed as cache rather than data. But surely that should be removed at some
> point, as it's deprecated in favor of service workers?)

For deprecated types I think we are going to leave them as it is now, and delete their data (and related code) at some point.
Comment 6 Michael Catanzaro 2023-02-03 08:45:39 PST
In that case, it sounds safe for us to use UnifiedOriginStorageLevel::Standard in the new GTK/WPE API version, and stick with UnifiedOriginStorageLevel::None for the old API version.

I guess we'd have a one-time loss of CacheStorage data since there is no migration yet, but I think that's OK.
Comment 7 Michael Catanzaro 2023-02-06 10:22:11 PST
Sorry, I misread Sihui's comment and didn't look closely enough at the code. I see WebKit really does handle the migration, so even better.
Comment 8 Michael Catanzaro 2023-02-06 10:56:05 PST
I notice the default UnifiedStorageLevel in NetworkSessionParameters is UnifiedOriginStorageLevel::Basic, but in WebsiteDataStore::defaultUnifiedOriginStorageLevel it is UnifiedOriginStorageLevel::None. I understand the WebsiteDataStore's value takes precedence, but that's certainly a little confusing.
Comment 9 Michael Catanzaro 2023-02-15 13:26:24 PST
Pull request: https://github.com/WebKit/WebKit/pull/10168
Comment 10 EWS 2023-02-16 09:28:41 PST
Committed 260375@main (c5ec4e0204d8): <https://commits.webkit.org/260375@main>

Reviewed commits have been landed. Closing PR #10168 and removing active labels.