Bug 226832 - Regression(r276653) We're going to disk more often for local storage operations
Summary: Regression(r276653) We're going to disk more often for local storage operations
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on: 226788
Blocks:
  Show dependency treegraph
 
Reported: 2021-06-09 11:43 PDT by Chris Dumez
Modified: 2021-06-11 13:35 PDT (History)
7 users (show)

See Also:


Attachments
Patch (8.30 KB, patch)
2021-06-09 11:45 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (8.14 KB, patch)
2021-06-10 13:51 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (8.19 KB, patch)
2021-06-10 15:43 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (8.19 KB, patch)
2021-06-11 13:08 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Dumez 2021-06-09 11:43:25 PDT
We're going to disk more often for local storage operations since r276653 because we no longer keep items in memory. This results in a slightly increased power usage on one of our benchmarks. As a first step to improve this, I am reintroducing a cache of the items in memory, as long as the values are not too large (1Kb limit). We still go to disk to look up values that are larger than 1Kb to avoid regressing memory usage.
Comment 1 Chris Dumez 2021-06-09 11:45:39 PDT
Created attachment 430985 [details]
Patch
Comment 2 Sihui Liu 2021-06-10 11:34:16 PDT
Do you have link to radar about the regression? Do we know what operation causes the regression? (If it's write, this does not seem to help?)
Comment 3 Chris Dumez 2021-06-10 11:47:37 PDT
(In reply to Sihui Liu from comment #2)
> Do you have link to radar about the regression? Do we know what operation
> causes the regression? (If it's write, this does not seem to help?)

We have a regression. The change that introduced this dropped 2 things: the in memory cache of the items and the batching of writes to the database. This patch reintroduced the in memory cache (with size limitation) and I have a follow up patch to reintroduce the batching of writes.
Comment 4 Geoffrey Garen 2021-06-10 11:49:31 PDT
Comment on attachment 430985 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=430985&action=review

Looks mostly good but I am confused by the value in the hash table, and maybe that means I found a bug.

> Source/WebKit/ChangeLog:12
> +        in memory, as long as the values are not too large (1Kb limit). We still go to disk to
> +        look up values that are larger than 1Kb to avoid regressing memory usage.

Seems like we're still always writing values to disk eagerly. Is that because the writes did not show up as costly / frequent in the benchmark?

> Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:42
> +constexpr unsigned maximumSizeForValuesKeptInMemory { 1024 }; // 1Kb

For my most pedantic review comment ever, I will point out that the capitalization is "kB" [ https://en.wikipedia.org/wiki/Kilobyte#cite_note-IEC80000-1 ].

> Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:150
> +            auto value = pair.value->isNull() ? itemBypassingCache(pair.key) : *pair.value;

Do we need to check pair.value for the unset optional value here?

> Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:162
> +    m_items = CachedItemsMap { };

Is it sufficient that we only initialize / use our cache if the client asks for all items? Is that guaranteed to happen, or expected to happen in most cases?

> Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.h:77
> +    // Cached version of the items in memory.
> +    // If the value is too large to keep in memory, we store a null String. We use std::optional<> for the
> +    // value to distinguish the key being missing vs the key being present but its value not being stored in
> +    // memory.

I'm a little lost in our use of two null values (either unset optional or set optional with null String). If a key is absent, that means there is no entry in the database. If a key is present and a value is set optional with null String, that means there is an entry in the database and it is not cached (because it is large). What does it mean when a key is present and a value is unset optional?
Comment 5 Chris Dumez 2021-06-10 11:55:33 PDT
(In reply to Geoffrey Garen from comment #4)
> Comment on attachment 430985 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=430985&action=review
> 
> Looks mostly good but I am confused by the value in the hash table, and
> maybe that means I found a bug.
> 
> > Source/WebKit/ChangeLog:12
> > +        in memory, as long as the values are not too large (1Kb limit). We still go to disk to
> > +        look up values that are larger than 1Kb to avoid regressing memory usage.
> 
> Seems like we're still always writing values to disk eagerly. Is that
> because the writes did not show up as costly / frequent in the benchmark?
> 
> > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:42
> > +constexpr unsigned maximumSizeForValuesKeptInMemory { 1024 }; // 1Kb
> 
> For my most pedantic review comment ever, I will point out that the
> capitalization is "kB" [
> https://en.wikipedia.org/wiki/Kilobyte#cite_note-IEC80000-1 ].
> 
> > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:150
> > +            auto value = pair.value->isNull() ? itemBypassingCache(pair.key) : *pair.value;
> 
> Do we need to check pair.value for the unset optional value here?
> 
> > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:162
> > +    m_items = CachedItemsMap { };
> 
> Is it sufficient that we only initialize / use our cache if the client asks
> for all items? Is that guaranteed to happen, or expected to happen in most
> cases?
> 
> > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.h:77
> > +    // Cached version of the items in memory.
> > +    // If the value is too large to keep in memory, we store a null String. We use std::optional<> for the
> > +    // value to distinguish the key being missing vs the key being present but its value not being stored in
> > +    // memory.
> 
> I'm a little lost in our use of two null values (either unset optional or
> set optional with null String). If a key is absent, that means there is no
> entry in the database. If a key is present and a value is set optional with
> null String, that means there is an entry in the database and it is not
> cached (because it is large). What does it mean when a key is present and a
> value is unset optional?

We never store std::nullopt in the hashmap, this is the « empty » value. So if I call get() and get std::nullopt then I know the key wasn’t prevent. If I get an optional with a string in it, then I know the key was present. If the string inside the optional is null, then it means it is too large and we have to fetch it from disk.
Comment 6 Darin Adler 2021-06-10 12:02:26 PDT
Comment on attachment 430985 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=430985&action=review

I don’t understand why Geoff did cq- and not review-; I was going to do review+ but I think Geoff gets to finish he review now.

I agree that we can make the use of optional and null clearer with comments. I’m OK with how you are using them but it’s subtle and as we know, subtle is not good for future programmer happiness.

>> Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:42
>> +constexpr unsigned maximumSizeForValuesKeptInMemory { 1024 }; // 1Kb
> 
> For my most pedantic review comment ever, I will point out that the capitalization is "kB" [ https://en.wikipedia.org/wiki/Kilobyte#cite_note-IEC80000-1 ].

Greatest comment. But not the most pedantic you could have written since you didn’t point out that 1kB is 1000 bytes.
Comment 7 Chris Dumez 2021-06-10 12:09:48 PDT
Any proposal on how to make the HashMap clearer? Maybe introduce my own class to represent the value instead of using std::optional?
Comment 8 Chris Dumez 2021-06-10 12:11:59 PDT
(In reply to Chris Dumez from comment #7)
> Any proposal on how to make the HashMap clearer? Maybe introduce my own
> class to represent the value instead of using std::optional?

Another way would be to use 2 separate container, a HashMap for fully cached values and a HashSet for keys whose value is too big. That would mean double hash lookup in some cases though.
Comment 9 Darin Adler 2021-06-10 13:10:20 PDT
(In reply to Chris Dumez from comment #7)
> Any proposal on how to make the HashMap clearer? Maybe introduce my own
> class to represent the value instead of using std::optional?

I don’t have any great ideas, but here are some less great ones:

Maybe a Variant would be clearer, so you could name each enumeration for the two null cases? Or WTF::Expected might work, with the unexpected values being that same enumeration I mentioned.

Or could use a class that could still maybe even store things as an optional and null, but the code would be written in a way that’s more explicit rather than using the std::optional and WTF::String functions directly? And with that class you could even change the storage to be even more efficient later if you wanted.
Comment 10 Chris Dumez 2021-06-10 13:15:53 PDT
(In reply to Darin Adler from comment #9)
> (In reply to Chris Dumez from comment #7)
> > Any proposal on how to make the HashMap clearer? Maybe introduce my own
> > class to represent the value instead of using std::optional?
> 
> I don’t have any great ideas, but here are some less great ones:
> 
> Maybe a Variant would be clearer, so you could name each enumeration for the
> two null cases? Or WTF::Expected might work, with the unexpected values
> being that same enumeration I mentioned.

I thought about a Variant but then I started wondering, what do I get as result if I call HashMap::get() with a key that doesn't exist? I don't think we have traits to handle Variant types nicely in HashMap? I also couldn't find any usage of Variant in HashMaps in the codebase.

> 
> Or could use a class that could still maybe even store things as an optional
> and null, but the code would be written in a way that’s more explicit rather
> than using the std::optional and WTF::String functions directly? And with
> that class you could even change the storage to be even more efficient later
> if you wanted.

Yes, I think I may just have to do that.
Comment 11 Chris Dumez 2021-06-10 13:22:17 PDT
(In reply to Geoffrey Garen from comment #4)
> Comment on attachment 430985 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=430985&action=review
> 
> Looks mostly good but I am confused by the value in the hash table, and
> maybe that means I found a bug.
> 
> > Source/WebKit/ChangeLog:12
> > +        in memory, as long as the values are not too large (1Kb limit). We still go to disk to
> > +        look up values that are larger than 1Kb to avoid regressing memory usage.
> 
> Seems like we're still always writing values to disk eagerly. Is that
> because the writes did not show up as costly / frequent in the benchmark?

Batching the writes is planned as a follow-up.

> 
> > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:42
> > +constexpr unsigned maximumSizeForValuesKeptInMemory { 1024 }; // 1Kb
> 
> For my most pedantic review comment ever, I will point out that the
> capitalization is "kB" [
> https://en.wikipedia.org/wiki/Kilobyte#cite_note-IEC80000-1 ].

Will fix.

> 
> > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:150
> > +            auto value = pair.value->isNull() ? itemBypassingCache(pair.key) : *pair.value;
> 
> Do we need to check pair.value for the unset optional value here?

I don't think so. You cannot store std::nullopt as a value in the HashMap AFAIK and we certainly don't with the current code.

> 
> > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:162
> > +    m_items = CachedItemsMap { };
> 
> Is it sufficient that we only initialize / use our cache if the client asks
> for all items? Is that guaranteed to happen, or expected to happen in most
> cases?

It always happens. That's the first thing that the client does since the WebProcess has its own copy of the data in memory and then only sends IPC to the network process to propagate updates to the network process (and other WebProcesses with pages of the same origin).

> 
> > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.h:77
> > +    // Cached version of the items in memory.
> > +    // If the value is too large to keep in memory, we store a null String. We use std::optional<> for the
> > +    // value to distinguish the key being missing vs the key being present but its value not being stored in
> > +    // memory.
> 
> I'm a little lost in our use of two null values (either unset optional or
> set optional with null String). If a key is absent, that means there is no
> entry in the database. If a key is present and a value is set optional with
> null String, that means there is an entry in the database and it is not
> cached (because it is large). What does it mean when a key is present and a
> value is unset optional?
Comment 12 Geoffrey Garen 2021-06-10 13:43:06 PDT
OK, I misunderstood before and now I understand.

I'd suggest HashMap<String, String>, and switching from get() to find().

I think a find() call followed by one check for "not found" and another check for "found, with null value", plus a comment about null values at the declaration site of the table, will be pretty clear. One specific improvement in clarity will be that there is only one null value possible (String()), rather than two (optional { }, optional { String() }).
Comment 13 Chris Dumez 2021-06-10 13:51:08 PDT
Created attachment 431122 [details]
Patch
Comment 14 Geoffrey Garen 2021-06-10 15:33:11 PDT
Comment on attachment 431122 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431122&action=review

Looks good to me, but I think I spotted a bug.

> Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:151
> +        for (auto& pair : *m_items) {
> +            auto value = pair.value.isNull() ? itemBypassingCache(pair.key) : pair.value;
> +            items.add(pair.key, WTFMove(value));

I would call "pair" "entry" here.

> Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:217
> +        if (it != m_items->end())
> +            return { };

I think this is supposed to be ==
Comment 15 Chris Dumez 2021-06-10 15:43:02 PDT
Created attachment 431149 [details]
Patch
Comment 16 Ben Nham 2021-06-11 10:12:07 PDT
Another option here if we want to lean more on SQLite is to just issue BEGIN/END TRANSACTIONs on our own. Right now we are executing statements outside of the context of a transaction, so each statement acts as if it was wrapped in its own BEGIN/END TRANSACTION (which is what is causing more writes, as each transaction commit means writing to durable storage).

So basically the policy might be that we coalesce every N statements into its own commit by surrounding those N statements with a BEGIN/END transaction. There would also probably need to be a timeout on this coalescing so that we have some upper bound on the amount of time we wait before issuing an END TRANSACTION.
Comment 17 Chris Dumez 2021-06-11 10:14:07 PDT
(In reply to Ben Nham from comment #16)
> Another option here if we want to lean more on SQLite is to just issue
> BEGIN/END TRANSACTIONs on our own. Right now we are executing statements
> outside of the context of a transaction, so each statement acts as if it was
> wrapped in its own BEGIN/END TRANSACTION (which is what is causing more
> writes, as each transaction commit means writing to durable storage).
> 
> So basically the policy might be that we coalesce every N statements into
> its own commit by surrounding those N statements with a BEGIN/END
> transaction. There would also probably need to be a timeout on this
> coalescing so that we have some upper bound on the amount of time we wait
> before issuing an END TRANSACTION.

This is what I have planned in my next patch, for the writes. This patch is about the reads.
Comment 18 Darin Adler 2021-06-11 12:57:01 PDT
Comment on attachment 431149 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=431149&action=review

Geoff reviewed this, but it says review? so I am setting review+.

> Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:41
> +static constexpr ASCIILiteral getItemsQueryString { "SELECT key, value FROM ItemTable"_s };

No need for static, no need for ASCIILiteral.

    constexpr auto getItemsQueryString { "SELECT key, value FROM ItemTable"_s };

> Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:149
> +        for (auto& entry : *m_items) {

For better performance, consider adding this?

    items.reserveInitialCapacity(m_items->size());
Comment 19 Chris Dumez 2021-06-11 12:58:11 PDT
(In reply to Darin Adler from comment #18)
> Comment on attachment 431149 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=431149&action=review
> 
> Geoff reviewed this, but it says review? so I am setting review+.

Geoff has r- twice but never r+.

> 
> > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:41
> > +static constexpr ASCIILiteral getItemsQueryString { "SELECT key, value FROM ItemTable"_s };
> 
> No need for static, no need for ASCIILiteral.
> 
>     constexpr auto getItemsQueryString { "SELECT key, value FROM
> ItemTable"_s };

Will fix.

> 
> > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:149
> > +        for (auto& entry : *m_items) {
> 
> For better performance, consider adding this?
> 
>     items.reserveInitialCapacity(m_items->size());

Good idea, will do.
Comment 20 Chris Dumez 2021-06-11 13:08:21 PDT
Created attachment 431226 [details]
Patch
Comment 21 Chris Dumez 2021-06-11 13:34:38 PDT
Comment on attachment 431226 [details]
Patch

Clearing flags on attachment: 431226

Committed r278778 (238735@main): <https://commits.webkit.org/238735@main>
Comment 22 Chris Dumez 2021-06-11 13:34:41 PDT
All reviewed patches have been landed.  Closing bug.
Comment 23 Radar WebKit Bug Importer 2021-06-11 13:35:38 PDT
<rdar://problem/79213716>