RESOLVED FIXED Bug 226832
Regression(r276653) We're going to disk more often for local storage operations
https://bugs.webkit.org/show_bug.cgi?id=226832
Summary Regression(r276653) We're going to disk more often for local storage operations
Chris Dumez
Reported 2021-06-09 11:43:25 PDT
We're going to disk more often for local storage operations since r276653 because we no longer keep items in memory. This results in a slightly increased power usage on one of our benchmarks. As a first step to improve this, I am reintroducing a cache of the items in memory, as long as the values are not too large (1Kb limit). We still go to disk to look up values that are larger than 1Kb to avoid regressing memory usage.
Attachments
Patch (8.30 KB, patch)
2021-06-09 11:45 PDT, Chris Dumez
no flags
Patch (8.14 KB, patch)
2021-06-10 13:51 PDT, Chris Dumez
no flags
Patch (8.19 KB, patch)
2021-06-10 15:43 PDT, Chris Dumez
no flags
Patch (8.19 KB, patch)
2021-06-11 13:08 PDT, Chris Dumez
no flags
Chris Dumez
Comment 1 2021-06-09 11:45:39 PDT
Sihui Liu
Comment 2 2021-06-10 11:34:16 PDT
Do you have link to radar about the regression? Do we know what operation causes the regression? (If it's write, this does not seem to help?)
Chris Dumez
Comment 3 2021-06-10 11:47:37 PDT
(In reply to Sihui Liu from comment #2) > Do you have link to radar about the regression? Do we know what operation > causes the regression? (If it's write, this does not seem to help?) We have a regression. The change that introduced this dropped 2 things: the in memory cache of the items and the batching of writes to the database. This patch reintroduced the in memory cache (with size limitation) and I have a follow up patch to reintroduce the batching of writes.
Geoffrey Garen
Comment 4 2021-06-10 11:49:31 PDT
Comment on attachment 430985 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=430985&action=review Looks mostly good but I am confused by the value in the hash table, and maybe that means I found a bug. > Source/WebKit/ChangeLog:12 > + in memory, as long as the values are not too large (1Kb limit). We still go to disk to > + look up values that are larger than 1Kb to avoid regressing memory usage. Seems like we're still always writing values to disk eagerly. Is that because the writes did not show up as costly / frequent in the benchmark? > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:42 > +constexpr unsigned maximumSizeForValuesKeptInMemory { 1024 }; // 1Kb For my most pedantic review comment ever, I will point out that the capitalization is "kB" [ https://en.wikipedia.org/wiki/Kilobyte#cite_note-IEC80000-1 ]. > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:150 > + auto value = pair.value->isNull() ? itemBypassingCache(pair.key) : *pair.value; Do we need to check pair.value for the unset optional value here? > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:162 > + m_items = CachedItemsMap { }; Is it sufficient that we only initialize / use our cache if the client asks for all items? Is that guaranteed to happen, or expected to happen in most cases? > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.h:77 > + // Cached version of the items in memory. > + // If the value is too large to keep in memory, we store a null String. We use std::optional<> for the > + // value to distinguish the key being missing vs the key being present but its value not being stored in > + // memory. I'm a little lost in our use of two null values (either unset optional or set optional with null String). If a key is absent, that means there is no entry in the database. If a key is present and a value is set optional with null String, that means there is an entry in the database and it is not cached (because it is large). What does it mean when a key is present and a value is unset optional?
Chris Dumez
Comment 5 2021-06-10 11:55:33 PDT
(In reply to Geoffrey Garen from comment #4) > Comment on attachment 430985 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=430985&action=review > > Looks mostly good but I am confused by the value in the hash table, and > maybe that means I found a bug. > > > Source/WebKit/ChangeLog:12 > > + in memory, as long as the values are not too large (1Kb limit). We still go to disk to > > + look up values that are larger than 1Kb to avoid regressing memory usage. > > Seems like we're still always writing values to disk eagerly. Is that > because the writes did not show up as costly / frequent in the benchmark? > > > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:42 > > +constexpr unsigned maximumSizeForValuesKeptInMemory { 1024 }; // 1Kb > > For my most pedantic review comment ever, I will point out that the > capitalization is "kB" [ > https://en.wikipedia.org/wiki/Kilobyte#cite_note-IEC80000-1 ]. > > > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:150 > > + auto value = pair.value->isNull() ? itemBypassingCache(pair.key) : *pair.value; > > Do we need to check pair.value for the unset optional value here? > > > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:162 > > + m_items = CachedItemsMap { }; > > Is it sufficient that we only initialize / use our cache if the client asks > for all items? Is that guaranteed to happen, or expected to happen in most > cases? > > > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.h:77 > > + // Cached version of the items in memory. > > + // If the value is too large to keep in memory, we store a null String. We use std::optional<> for the > > + // value to distinguish the key being missing vs the key being present but its value not being stored in > > + // memory. > > I'm a little lost in our use of two null values (either unset optional or > set optional with null String). If a key is absent, that means there is no > entry in the database. If a key is present and a value is set optional with > null String, that means there is an entry in the database and it is not > cached (because it is large). What does it mean when a key is present and a > value is unset optional? We never store std::nullopt in the hashmap, this is the « empty » value. So if I call get() and get std::nullopt then I know the key wasn’t prevent. If I get an optional with a string in it, then I know the key was present. If the string inside the optional is null, then it means it is too large and we have to fetch it from disk.
Darin Adler
Comment 6 2021-06-10 12:02:26 PDT
Comment on attachment 430985 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=430985&action=review I don’t understand why Geoff did cq- and not review-; I was going to do review+ but I think Geoff gets to finish he review now. I agree that we can make the use of optional and null clearer with comments. I’m OK with how you are using them but it’s subtle and as we know, subtle is not good for future programmer happiness. >> Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:42 >> +constexpr unsigned maximumSizeForValuesKeptInMemory { 1024 }; // 1Kb > > For my most pedantic review comment ever, I will point out that the capitalization is "kB" [ https://en.wikipedia.org/wiki/Kilobyte#cite_note-IEC80000-1 ]. Greatest comment. But not the most pedantic you could have written since you didn’t point out that 1kB is 1000 bytes.
Chris Dumez
Comment 7 2021-06-10 12:09:48 PDT
Any proposal on how to make the HashMap clearer? Maybe introduce my own class to represent the value instead of using std::optional?
Chris Dumez
Comment 8 2021-06-10 12:11:59 PDT
(In reply to Chris Dumez from comment #7) > Any proposal on how to make the HashMap clearer? Maybe introduce my own > class to represent the value instead of using std::optional? Another way would be to use 2 separate container, a HashMap for fully cached values and a HashSet for keys whose value is too big. That would mean double hash lookup in some cases though.
Darin Adler
Comment 9 2021-06-10 13:10:20 PDT
(In reply to Chris Dumez from comment #7) > Any proposal on how to make the HashMap clearer? Maybe introduce my own > class to represent the value instead of using std::optional? I don’t have any great ideas, but here are some less great ones: Maybe a Variant would be clearer, so you could name each enumeration for the two null cases? Or WTF::Expected might work, with the unexpected values being that same enumeration I mentioned. Or could use a class that could still maybe even store things as an optional and null, but the code would be written in a way that’s more explicit rather than using the std::optional and WTF::String functions directly? And with that class you could even change the storage to be even more efficient later if you wanted.
Chris Dumez
Comment 10 2021-06-10 13:15:53 PDT
(In reply to Darin Adler from comment #9) > (In reply to Chris Dumez from comment #7) > > Any proposal on how to make the HashMap clearer? Maybe introduce my own > > class to represent the value instead of using std::optional? > > I don’t have any great ideas, but here are some less great ones: > > Maybe a Variant would be clearer, so you could name each enumeration for the > two null cases? Or WTF::Expected might work, with the unexpected values > being that same enumeration I mentioned. I thought about a Variant but then I started wondering, what do I get as result if I call HashMap::get() with a key that doesn't exist? I don't think we have traits to handle Variant types nicely in HashMap? I also couldn't find any usage of Variant in HashMaps in the codebase. > > Or could use a class that could still maybe even store things as an optional > and null, but the code would be written in a way that’s more explicit rather > than using the std::optional and WTF::String functions directly? And with > that class you could even change the storage to be even more efficient later > if you wanted. Yes, I think I may just have to do that.
Chris Dumez
Comment 11 2021-06-10 13:22:17 PDT
(In reply to Geoffrey Garen from comment #4) > Comment on attachment 430985 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=430985&action=review > > Looks mostly good but I am confused by the value in the hash table, and > maybe that means I found a bug. > > > Source/WebKit/ChangeLog:12 > > + in memory, as long as the values are not too large (1Kb limit). We still go to disk to > > + look up values that are larger than 1Kb to avoid regressing memory usage. > > Seems like we're still always writing values to disk eagerly. Is that > because the writes did not show up as costly / frequent in the benchmark? Batching the writes is planned as a follow-up. > > > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:42 > > +constexpr unsigned maximumSizeForValuesKeptInMemory { 1024 }; // 1Kb > > For my most pedantic review comment ever, I will point out that the > capitalization is "kB" [ > https://en.wikipedia.org/wiki/Kilobyte#cite_note-IEC80000-1 ]. Will fix. > > > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:150 > > + auto value = pair.value->isNull() ? itemBypassingCache(pair.key) : *pair.value; > > Do we need to check pair.value for the unset optional value here? I don't think so. You cannot store std::nullopt as a value in the HashMap AFAIK and we certainly don't with the current code. > > > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:162 > > + m_items = CachedItemsMap { }; > > Is it sufficient that we only initialize / use our cache if the client asks > for all items? Is that guaranteed to happen, or expected to happen in most > cases? It always happens. That's the first thing that the client does since the WebProcess has its own copy of the data in memory and then only sends IPC to the network process to propagate updates to the network process (and other WebProcesses with pages of the same origin). > > > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.h:77 > > + // Cached version of the items in memory. > > + // If the value is too large to keep in memory, we store a null String. We use std::optional<> for the > > + // value to distinguish the key being missing vs the key being present but its value not being stored in > > + // memory. > > I'm a little lost in our use of two null values (either unset optional or > set optional with null String). If a key is absent, that means there is no > entry in the database. If a key is present and a value is set optional with > null String, that means there is an entry in the database and it is not > cached (because it is large). What does it mean when a key is present and a > value is unset optional?
Geoffrey Garen
Comment 12 2021-06-10 13:43:06 PDT
OK, I misunderstood before and now I understand. I'd suggest HashMap<String, String>, and switching from get() to find(). I think a find() call followed by one check for "not found" and another check for "found, with null value", plus a comment about null values at the declaration site of the table, will be pretty clear. One specific improvement in clarity will be that there is only one null value possible (String()), rather than two (optional { }, optional { String() }).
Chris Dumez
Comment 13 2021-06-10 13:51:08 PDT
Geoffrey Garen
Comment 14 2021-06-10 15:33:11 PDT
Comment on attachment 431122 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431122&action=review Looks good to me, but I think I spotted a bug. > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:151 > + for (auto& pair : *m_items) { > + auto value = pair.value.isNull() ? itemBypassingCache(pair.key) : pair.value; > + items.add(pair.key, WTFMove(value)); I would call "pair" "entry" here. > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:217 > + if (it != m_items->end()) > + return { }; I think this is supposed to be ==
Chris Dumez
Comment 15 2021-06-10 15:43:02 PDT
Ben Nham
Comment 16 2021-06-11 10:12:07 PDT
Another option here if we want to lean more on SQLite is to just issue BEGIN/END TRANSACTIONs on our own. Right now we are executing statements outside of the context of a transaction, so each statement acts as if it was wrapped in its own BEGIN/END TRANSACTION (which is what is causing more writes, as each transaction commit means writing to durable storage). So basically the policy might be that we coalesce every N statements into its own commit by surrounding those N statements with a BEGIN/END transaction. There would also probably need to be a timeout on this coalescing so that we have some upper bound on the amount of time we wait before issuing an END TRANSACTION.
Chris Dumez
Comment 17 2021-06-11 10:14:07 PDT
(In reply to Ben Nham from comment #16) > Another option here if we want to lean more on SQLite is to just issue > BEGIN/END TRANSACTIONs on our own. Right now we are executing statements > outside of the context of a transaction, so each statement acts as if it was > wrapped in its own BEGIN/END TRANSACTION (which is what is causing more > writes, as each transaction commit means writing to durable storage). > > So basically the policy might be that we coalesce every N statements into > its own commit by surrounding those N statements with a BEGIN/END > transaction. There would also probably need to be a timeout on this > coalescing so that we have some upper bound on the amount of time we wait > before issuing an END TRANSACTION. This is what I have planned in my next patch, for the writes. This patch is about the reads.
Darin Adler
Comment 18 2021-06-11 12:57:01 PDT
Comment on attachment 431149 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=431149&action=review Geoff reviewed this, but it says review? so I am setting review+. > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:41 > +static constexpr ASCIILiteral getItemsQueryString { "SELECT key, value FROM ItemTable"_s }; No need for static, no need for ASCIILiteral. constexpr auto getItemsQueryString { "SELECT key, value FROM ItemTable"_s }; > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:149 > + for (auto& entry : *m_items) { For better performance, consider adding this? items.reserveInitialCapacity(m_items->size());
Chris Dumez
Comment 19 2021-06-11 12:58:11 PDT
(In reply to Darin Adler from comment #18) > Comment on attachment 431149 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=431149&action=review > > Geoff reviewed this, but it says review? so I am setting review+. Geoff has r- twice but never r+. > > > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:41 > > +static constexpr ASCIILiteral getItemsQueryString { "SELECT key, value FROM ItemTable"_s }; > > No need for static, no need for ASCIILiteral. > > constexpr auto getItemsQueryString { "SELECT key, value FROM > ItemTable"_s }; Will fix. > > > Source/WebKit/NetworkProcess/WebStorage/LocalStorageDatabase.cpp:149 > > + for (auto& entry : *m_items) { > > For better performance, consider adding this? > > items.reserveInitialCapacity(m_items->size()); Good idea, will do.
Chris Dumez
Comment 20 2021-06-11 13:08:21 PDT
Chris Dumez
Comment 21 2021-06-11 13:34:38 PDT
Comment on attachment 431226 [details] Patch Clearing flags on attachment: 431226 Committed r278778 (238735@main): <https://commits.webkit.org/238735@main>
Chris Dumez
Comment 22 2021-06-11 13:34:41 PDT
All reviewed patches have been landed. Closing bug.
Radar WebKit Bug Importer
Comment 23 2021-06-11 13:35:38 PDT
Note You need to log in before you can comment on or make changes to this bug.