Bug 202483 - IndexedDB: include size of index records in size estimate of put/add task
Summary: IndexedDB: include size of index records in size estimate of put/add task
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Sihui Liu
URL:
Keywords: InRadar
Depends on: 201957
Blocks:
  Show dependency treegraph
 
Reported: 2019-10-02 11:04 PDT by Sihui Liu
Modified: 2019-10-09 15:21 PDT (History)
8 users (show)

See Also:


Attachments
Patch (38.03 KB, patch)
2019-10-02 11:59 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (38.01 KB, patch)
2019-10-02 14:21 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
test.html (1.91 KB, text/html)
2019-10-02 18:59 PDT, Sihui Liu
no flags Details
Patch (64.89 KB, patch)
2019-10-02 20:32 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (2.24 KB, patch)
2019-10-04 17:37 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (2.24 KB, patch)
2019-10-07 08:46 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (9.75 KB, patch)
2019-10-09 13:07 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch for landing (9.74 KB, patch)
2019-10-09 14:16 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch for landing (9.80 KB, patch)
2019-10-09 14:19 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sihui Liu 2019-10-02 11:04:41 PDT
...
Comment 1 Sihui Liu 2019-10-02 11:59:37 PDT
Created attachment 380045 [details]
Patch
Comment 2 Geoffrey Garen 2019-10-02 12:44:58 PDT
Comment on attachment 380045 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=380045&action=review

> Source/WebCore/Modules/indexeddb/shared/IDBError.cpp:70
> +uint64_t IDBError::taskSize() const
> +{
> +    ASSERT(isNull() || (!isNull() && !m_size));
> +    return m_size;
> +}

It's an unfortunate / surprising design to use an error object to communicate a successful result (size). Usually, an error communicates that something failed. But in this case, you're using it to communicate that something succeeded.

Can you return the size of the task some other way? Or Make an abstraction like a Variant that's a either an error or a size, depending on whether the task succeeded?
Comment 3 Geoffrey Garen 2019-10-02 12:52:36 PDT
Comment on attachment 380045 [details]
Patch

Is the purpose of this design to avoid storing something too big, or to avoid prompting the user too much?

Can you include a test case that demonstrates the bug we're trying to fix by being more precise? It's probably easier to talk about whether a test passes or fails than it is to discuss design in the abstract.

If the purpose of this design is to avoid storing something too big, I'm not sure we've achieved that goal. An item that is estimated to be small, but turns out to be big, will be stored. Later, we will update our estimate to know that we've gone over quota. But by that time, we're already over quota, possibly by a lot. So, I'd like to understand what we're trying to do with this more precise estimate.
Comment 4 Sihui Liu 2019-10-02 14:21:36 PDT
Created attachment 380059 [details]
Patch
Comment 5 Sihui Liu 2019-10-02 18:59:45 PDT
Created attachment 380076 [details]
test.html
Comment 6 Sihui Liu 2019-10-02 19:17:40 PDT
(In reply to Geoffrey Garen from comment #3)
> Comment on attachment 380045 [details]
> Patch
> 
> Is the purpose of this design to avoid storing something too big, or to
> avoid prompting the user too much?
> 
> Can you include a test case that demonstrates the bug we're trying to fix by
> being more precise? It's probably easier to talk about whether a test passes
> or fails than it is to discuss design in the abstract.
> 
Test file uploaded.

For each add operation, estimated task size before execution is about 58KB, actual task size reported after execution is about 1.09MB. Database file size after db.close() is about 9MB.

> If the purpose of this design is to avoid storing something too big, I'm not
> sure we've achieved that goal. An item that is estimated to be small, but
> turns out to be big, will be stored. Later, we will update our estimate to
> know that we've gone over quota. But by that time, we're already over quota,
> possibly by a lot. So, I'd like to understand what we're trying to do with
> this more precise estimate.

That's true. This is an effort to reduce effect of this problem but not to solve the problem completely.

One case where updating the size after task execution could be meaningful is when we have multiple transactions.
As in the test file, it executes two transactions. When we set the quota to be 1MB and do not update the task size, both transactions would succeed. When we update the task size estimate, the second transaction would fail as the estimated usage would be updated at the end of transaction.

Bette size estimate also helps the case where one origin has multiple databases or multiple storage types for the same reason.
Comment 7 Sihui Liu 2019-10-02 20:32:07 PDT
Created attachment 380077 [details]
Patch
Comment 8 youenn fablet 2019-10-03 00:01:14 PDT
Comment on attachment 380077 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=380077&action=review

> Source/WebCore/Modules/indexeddb/shared/IDBError.h:100
> +class IDBErrorOrSize {

I would rename it to something like OperationSizeOrError or TaskSizeOrError.
An Expected<uint64_t, IDBError> might be better in principle, it might require adding some makeUnexpected.
Otherwise a Variant.
Comment 9 Geoffrey Garen 2019-10-03 21:25:02 PDT
> > Can you include a test case that demonstrates the bug we're trying to fix by
> > being more precise? It's probably easier to talk about whether a test passes
> > or fails than it is to discuss design in the abstract.
> > 
> Test file uploaded.
> 
> For each add operation, estimated task size before execution is about 58KB,
> actual task size reported after execution is about 1.09MB. Database file
> size after db.close() is about 9MB.

Thank you!

I think this test demonstrates two issues:

(1) In the worst case, when we store an object, we replicate the store in every index.

(2) In the worst worst case, SQLite may use even more space than (1), due to internal implementation details.

It looks like this patch attempts to address (1) but not (2). So, let's set aside (2) for the moment, and treat it as a separate bug. (Probably the fix for that bug will be to use "PRAGMA auto_vacuum = FULL" or similar.)

In the case of (1), it looks like this patch attempts to address the subset of cases where the indexing overhead accumulates gradually over many stores, but not immediately in one store. So, if I set indexCount to 1000000, or I change createObject() to call randomArray(1000000)), this patch does not stop me.

Is that right?

Consider this algorithm, which I'll name INDEX_ESTIMATE_SYNC:

bytes = (object size) * (number of indices)
if (bytes > bytesAvailable)
    bytesAvailable = synchronously read file sizes on disk
    if (bytes > bytesAvailable)
        bytesAvailable = ask user for more space
    if (bytes > bytesAvailable)
        return an error
bytesAvailable -= bytes

And this algorithm, which I'll name INDEX_ESTIMATE_ASYNC:

bytes = (object size) * (number of indices)
if (bytes > bytesAvailable)
    if (bytesAvailableNeedsUpdate)
        schedule a task:
            bytesAvailable = synchronously read file sizes on disk
            bytesAvailableNeedsUpdate = false
            restart this algorithm
    else
        bytesAvailable = ask user for more space
        if (bytes > bytesAvailable)
            return an error

bytesAvailableNeedsUpdate = true
bytesAvailable -= bytes

Is INDEX_ESTIMATE or INDEX_ESTIMATE_ASYNC practical to implement in our database task system?

I believe that INDEX_ESTIMATE_* would solve the problem in (1) in all cases. I also believe it has the benefit of being a little more straightforward. The process of estimating a size, reserving space for it, and then passing back the difference between the estimated size and the true size (which is itself still an estimate) is challenging to follow.

What do you think?
Comment 10 Sihui Liu 2019-10-04 01:02:17 PDT
(In reply to Geoffrey Garen from comment #9)
> > > Can you include a test case that demonstrates the bug we're trying to fix by
> > > being more precise? It's probably easier to talk about whether a test passes
> > > or fails than it is to discuss design in the abstract.
> > > 
> > Test file uploaded.
> > 
> > For each add operation, estimated task size before execution is about 58KB,
> > actual task size reported after execution is about 1.09MB. Database file
> > size after db.close() is about 9MB.
> 
> Thank you!
> 
> I think this test demonstrates two issues:
> 
> (1) In the worst case, when we store an object, we replicate the store in
> every index.
> 
> (2) In the worst worst case, SQLite may use even more space than (1), due to
> internal implementation details.
> 
> It looks like this patch attempts to address (1) but not (2). So, let's set
> aside (2) for the moment, and treat it as a separate bug. (Probably the fix
> for that bug will be to use "PRAGMA auto_vacuum = FULL" or similar.)
> 
> In the case of (1), it looks like this patch attempts to address the subset
> of cases where the indexing overhead accumulates gradually over many stores,
> but not immediately in one store. So, if I set indexCount to 1000000, or I
> change createObject() to call randomArray(1000000)), this patch does not
> stop me.
> 
> Is that right?
> 
Right.

> Consider this algorithm, which I'll name INDEX_ESTIMATE_SYNC:
> 
> bytes = (object size) * (number of indices)
> if (bytes > bytesAvailable)
>     bytesAvailable = synchronously read file sizes on disk
>     if (bytes > bytesAvailable)
>         bytesAvailable = ask user for more space
>     if (bytes > bytesAvailable)
>         return an error
> bytesAvailable -= bytes
> 
> And this algorithm, which I'll name INDEX_ESTIMATE_ASYNC:
> 
> bytes = (object size) * (number of indices)
> if (bytes > bytesAvailable)
>     if (bytesAvailableNeedsUpdate)
>         schedule a task:
>             bytesAvailable = synchronously read file sizes on disk
>             bytesAvailableNeedsUpdate = false
>             restart this algorithm
>     else
>         bytesAvailable = ask user for more space
>         if (bytes > bytesAvailable)
>             return an error
> 
> bytesAvailableNeedsUpdate = true
> bytesAvailable -= bytes
> 
> Is INDEX_ESTIMATE or INDEX_ESTIMATE_ASYNC practical to implement in our
> database task system?
> 
> I believe that INDEX_ESTIMATE_* would solve the problem in (1) in all cases.
> I also believe it has the benefit of being a little more straightforward.
> The process of estimating a size, reserving space for it, and then passing
> back the difference between the estimated size and the true size (which is
> itself still an estimate) is challenging to follow.
> 
> What do you think?

I think this proposal is doable. Current patch would give a more accurate estimate than the proposed one but it would update the size estimate only after task execution, before which StorageQuotaManager may already grant some requests that it should not. It's more safe and cleaner to have a good task estimate when requesting the space.
Comment 11 Sihui Liu 2019-10-04 17:37:55 PDT
Created attachment 380273 [details]
Patch
Comment 12 Sihui Liu 2019-10-07 08:46:19 PDT
Created attachment 380330 [details]
Patch
Comment 13 Geoffrey Garen 2019-10-07 20:10:01 PDT
Comment on attachment 380330 [details]
Patch

r=me

Unless the result is impractical, I think we should land this change with a test case similar to the one you uploaded, with indexCount set to 1000000, and/or createObject() set to call randomArray(1000000)). That way, the test documents the reason we made this implementation choice, and it can be used to qualified other implementations we may want to try out in the future.
Comment 14 Geoffrey Garen 2019-10-07 20:12:11 PDT
(The test case should verify that the store fails.)
Comment 15 Sihui Liu 2019-10-09 13:07:57 PDT
Created attachment 380558 [details]
Patch
Comment 16 Sihui Liu 2019-10-09 14:16:17 PDT
Created attachment 380564 [details]
Patch for landing
Comment 17 Sihui Liu 2019-10-09 14:19:00 PDT
Created attachment 380566 [details]
Patch for landing
Comment 18 WebKit Commit Bot 2019-10-09 15:20:36 PDT
Comment on attachment 380566 [details]
Patch for landing

Clearing flags on attachment: 380566

Committed r250936: <https://trac.webkit.org/changeset/250936>
Comment 19 WebKit Commit Bot 2019-10-09 15:20:38 PDT
All reviewed patches have been landed.  Closing bug.
Comment 20 Radar WebKit Bug Importer 2019-10-09 15:21:17 PDT
<rdar://problem/56132514>