202483 – IndexedDB: include size of index records in size estimate of put/add task

RESOLVED FIXED 202483

IndexedDB: include size of index records in size estimate of put/add task

https://bugs.webkit.org/show_bug.cgi?id=202483

Summary IndexedDB: include size of index records in size estimate of put/add task

Sihui Liu

Reported 2019-10-02 11:04:41 PDT

...

Attachments
Patch (38.03 KB, patch) 2019-10-02 11:59 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (38.01 KB, patch) 2019-10-02 14:21 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
test.html (1.91 KB, text/html) 2019-10-02 18:59 PDT, Sihui Liu	no flags	Details
Patch (64.89 KB, patch) 2019-10-02 20:32 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (2.24 KB, patch) 2019-10-04 17:37 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (2.24 KB, patch) 2019-10-07 08:46 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch (9.75 KB, patch) 2019-10-09 13:07 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch for landing (9.74 KB, patch) 2019-10-09 14:16 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Patch for landing (9.80 KB, patch) 2019-10-09 14:19 PDT, Sihui Liu	no flags	Details Formatted Diff Diff
Show Obsolete (7) View All Add attachment proposed patch, testcase, etc.

Sihui Liu

Comment 1 2019-10-02 11:59:37 PDT

Created attachment 380045 [details] Patch

Geoffrey Garen

Comment 2 2019-10-02 12:44:58 PDT

Comment on attachment 380045 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=380045&action=review > Source/WebCore/Modules/indexeddb/shared/IDBError.cpp:70 > +uint64_t IDBError::taskSize() const > +{ > + ASSERT(isNull() || (!isNull() && !m_size)); > + return m_size; > +} It's an unfortunate / surprising design to use an error object to communicate a successful result (size). Usually, an error communicates that something failed. But in this case, you're using it to communicate that something succeeded. Can you return the size of the task some other way? Or Make an abstraction like a Variant that's a either an error or a size, depending on whether the task succeeded?

Geoffrey Garen

Comment 3 2019-10-02 12:52:36 PDT

Comment on attachment 380045 [details] Patch Is the purpose of this design to avoid storing something too big, or to avoid prompting the user too much? Can you include a test case that demonstrates the bug we're trying to fix by being more precise? It's probably easier to talk about whether a test passes or fails than it is to discuss design in the abstract. If the purpose of this design is to avoid storing something too big, I'm not sure we've achieved that goal. An item that is estimated to be small, but turns out to be big, will be stored. Later, we will update our estimate to know that we've gone over quota. But by that time, we're already over quota, possibly by a lot. So, I'd like to understand what we're trying to do with this more precise estimate.

Sihui Liu

Comment 4 2019-10-02 14:21:36 PDT

Created attachment 380059 [details] Patch

Sihui Liu

Comment 5 2019-10-02 18:59:45 PDT

Created attachment 380076 [details] test.html

Sihui Liu

Comment 6 2019-10-02 19:17:40 PDT

(In reply to Geoffrey Garen from comment #3) > Comment on attachment 380045 [details] > Patch > > Is the purpose of this design to avoid storing something too big, or to > avoid prompting the user too much? > > Can you include a test case that demonstrates the bug we're trying to fix by > being more precise? It's probably easier to talk about whether a test passes > or fails than it is to discuss design in the abstract. > Test file uploaded. For each add operation, estimated task size before execution is about 58KB, actual task size reported after execution is about 1.09MB. Database file size after db.close() is about 9MB. > If the purpose of this design is to avoid storing something too big, I'm not > sure we've achieved that goal. An item that is estimated to be small, but > turns out to be big, will be stored. Later, we will update our estimate to > know that we've gone over quota. But by that time, we're already over quota, > possibly by a lot. So, I'd like to understand what we're trying to do with > this more precise estimate. That's true. This is an effort to reduce effect of this problem but not to solve the problem completely. One case where updating the size after task execution could be meaningful is when we have multiple transactions. As in the test file, it executes two transactions. When we set the quota to be 1MB and do not update the task size, both transactions would succeed. When we update the task size estimate, the second transaction would fail as the estimated usage would be updated at the end of transaction. Bette size estimate also helps the case where one origin has multiple databases or multiple storage types for the same reason.

Sihui Liu

Comment 7 2019-10-02 20:32:07 PDT

Created attachment 380077 [details] Patch

youenn fablet

Comment 8 2019-10-03 00:01:14 PDT

Comment on attachment 380077 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=380077&action=review > Source/WebCore/Modules/indexeddb/shared/IDBError.h:100 > +class IDBErrorOrSize { I would rename it to something like OperationSizeOrError or TaskSizeOrError. An Expected<uint64_t, IDBError> might be better in principle, it might require adding some makeUnexpected. Otherwise a Variant.

Geoffrey Garen

Comment 9 2019-10-03 21:25:02 PDT

> > Can you include a test case that demonstrates the bug we're trying to fix by > > being more precise? It's probably easier to talk about whether a test passes > > or fails than it is to discuss design in the abstract. > > > Test file uploaded. > > For each add operation, estimated task size before execution is about 58KB, > actual task size reported after execution is about 1.09MB. Database file > size after db.close() is about 9MB. Thank you! I think this test demonstrates two issues: (1) In the worst case, when we store an object, we replicate the store in every index. (2) In the worst worst case, SQLite may use even more space than (1), due to internal implementation details. It looks like this patch attempts to address (1) but not (2). So, let's set aside (2) for the moment, and treat it as a separate bug. (Probably the fix for that bug will be to use "PRAGMA auto_vacuum = FULL" or similar.) In the case of (1), it looks like this patch attempts to address the subset of cases where the indexing overhead accumulates gradually over many stores, but not immediately in one store. So, if I set indexCount to 1000000, or I change createObject() to call randomArray(1000000)), this patch does not stop me. Is that right? Consider this algorithm, which I'll name INDEX_ESTIMATE_SYNC: bytes = (object size) * (number of indices) if (bytes > bytesAvailable) bytesAvailable = synchronously read file sizes on disk if (bytes > bytesAvailable) bytesAvailable = ask user for more space if (bytes > bytesAvailable) return an error bytesAvailable -= bytes And this algorithm, which I'll name INDEX_ESTIMATE_ASYNC: bytes = (object size) * (number of indices) if (bytes > bytesAvailable) if (bytesAvailableNeedsUpdate) schedule a task: bytesAvailable = synchronously read file sizes on disk bytesAvailableNeedsUpdate = false restart this algorithm else bytesAvailable = ask user for more space if (bytes > bytesAvailable) return an error bytesAvailableNeedsUpdate = true bytesAvailable -= bytes Is INDEX_ESTIMATE or INDEX_ESTIMATE_ASYNC practical to implement in our database task system? I believe that INDEX_ESTIMATE_* would solve the problem in (1) in all cases. I also believe it has the benefit of being a little more straightforward. The process of estimating a size, reserving space for it, and then passing back the difference between the estimated size and the true size (which is itself still an estimate) is challenging to follow. What do you think?

Sihui Liu

Comment 10 2019-10-04 01:02:17 PDT

(In reply to Geoffrey Garen from comment #9) > > > Can you include a test case that demonstrates the bug we're trying to fix by > > > being more precise? It's probably easier to talk about whether a test passes > > > or fails than it is to discuss design in the abstract. > > > > > Test file uploaded. > > > > For each add operation, estimated task size before execution is about 58KB, > > actual task size reported after execution is about 1.09MB. Database file > > size after db.close() is about 9MB. > > Thank you! > > I think this test demonstrates two issues: > > (1) In the worst case, when we store an object, we replicate the store in > every index. > > (2) In the worst worst case, SQLite may use even more space than (1), due to > internal implementation details. > > It looks like this patch attempts to address (1) but not (2). So, let's set > aside (2) for the moment, and treat it as a separate bug. (Probably the fix > for that bug will be to use "PRAGMA auto_vacuum = FULL" or similar.) > > In the case of (1), it looks like this patch attempts to address the subset > of cases where the indexing overhead accumulates gradually over many stores, > but not immediately in one store. So, if I set indexCount to 1000000, or I > change createObject() to call randomArray(1000000)), this patch does not > stop me. > > Is that right? > Right. > Consider this algorithm, which I'll name INDEX_ESTIMATE_SYNC: > > bytes = (object size) * (number of indices) > if (bytes > bytesAvailable) > bytesAvailable = synchronously read file sizes on disk > if (bytes > bytesAvailable) > bytesAvailable = ask user for more space > if (bytes > bytesAvailable) > return an error > bytesAvailable -= bytes > > And this algorithm, which I'll name INDEX_ESTIMATE_ASYNC: > > bytes = (object size) * (number of indices) > if (bytes > bytesAvailable) > if (bytesAvailableNeedsUpdate) > schedule a task: > bytesAvailable = synchronously read file sizes on disk > bytesAvailableNeedsUpdate = false > restart this algorithm > else > bytesAvailable = ask user for more space > if (bytes > bytesAvailable) > return an error > > bytesAvailableNeedsUpdate = true > bytesAvailable -= bytes > > Is INDEX_ESTIMATE or INDEX_ESTIMATE_ASYNC practical to implement in our > database task system? > > I believe that INDEX_ESTIMATE_* would solve the problem in (1) in all cases. > I also believe it has the benefit of being a little more straightforward. > The process of estimating a size, reserving space for it, and then passing > back the difference between the estimated size and the true size (which is > itself still an estimate) is challenging to follow. > > What do you think? I think this proposal is doable. Current patch would give a more accurate estimate than the proposed one but it would update the size estimate only after task execution, before which StorageQuotaManager may already grant some requests that it should not. It's more safe and cleaner to have a good task estimate when requesting the space.

Sihui Liu

Comment 11 2019-10-04 17:37:55 PDT

Created attachment 380273 [details] Patch

Sihui Liu

Comment 12 2019-10-07 08:46:19 PDT

Created attachment 380330 [details] Patch

Geoffrey Garen

Comment 13 2019-10-07 20:10:01 PDT

Comment on attachment 380330 [details] Patch r=me Unless the result is impractical, I think we should land this change with a test case similar to the one you uploaded, with indexCount set to 1000000, and/or createObject() set to call randomArray(1000000)). That way, the test documents the reason we made this implementation choice, and it can be used to qualified other implementations we may want to try out in the future.

Geoffrey Garen

Comment 14 2019-10-07 20:12:11 PDT

(The test case should verify that the store fails.)

Sihui Liu

Comment 15 2019-10-09 13:07:57 PDT

Created attachment 380558 [details] Patch

Sihui Liu

Comment 16 2019-10-09 14:16:17 PDT

Created attachment 380564 [details] Patch for landing

Sihui Liu

Comment 17 2019-10-09 14:19:00 PDT

Created attachment 380566 [details] Patch for landing

WebKit Commit Bot

Comment 18 2019-10-09 15:20:36 PDT

Comment on attachment 380566 [details] Patch for landing Clearing flags on attachment: 380566 Committed r250936: <https://trac.webkit.org/changeset/250936>

WebKit Commit Bot

Comment 19 2019-10-09 15:20:38 PDT

All reviewed patches have been landed. Closing bug.

Radar WebKit Bug Importer

Comment 20 2019-10-09 15:21:17 PDT

<rdar://problem/56132514>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Sihui Liu

Reported

2019-10-02 11:04 PDT

Modified

2019-10-09 15:21 PDT History

CC List

8 users Show

URL

Keywords InRadar

Depends on

201957

Blocks

Dependencies

tree graph