RESOLVED FIXED 164754
IndexedDB 2.0: Key collation during SQLite lookups is insanely slow
https://bugs.webkit.org/show_bug.cgi?id=164754
Summary IndexedDB 2.0: Key collation during SQLite lookups is insanely slow
Brady Eidson
Reported 2016-11-14 17:19:31 PST
IndexedDB 2.0: Key collation during SQLite lookups is insanely slow This is what causes us to fail imported/w3c/web-platform-tests/IndexedDB/idbindex-multientry-big.htm due to timing out. Looking at profiles one can confirm that a vast majority of the time is spent under WebCore::idbKeyCollate, and most of that time is spent allocating, using, and deallocating both Strings and CFPropertyLists. Using KeyedEncoder/Decoder for IDBKeys on disk was probably a bad idea; Common keys (e.g. a number) are *MUCH* larger than they need to be, and all keys require a lot of memory churn and CPU time just for SQLite's comparison sake. If we move to a custom serialization format targeted at IDBKeys we can vastly improve this. That last statement is not a hypothesis; The afore mentioned idbindex-multientry-big.htm takes ~26 seconds to complete with timeouts disabled today, but with a custom format it takes less than 2 seconds. And profiling confirms that whereas collation is ~97% of the 26 seconds, it is less than 1.4% of the ~2 seconds.
Attachments
WIP (not ready for review) (11.29 KB, patch)
2016-11-14 17:25 PST, Brady Eidson
no flags
Patch (16.36 KB, patch)
2016-11-14 20:55 PST, Brady Eidson
no flags
Patch (16.64 KB, patch)
2016-11-14 21:52 PST, Brady Eidson
no flags
Patch (16.69 KB, patch)
2016-11-15 08:50 PST, Brady Eidson
no flags
Patch (17.48 KB, patch)
2016-11-15 12:36 PST, Brady Eidson
no flags
Patch (16.55 KB, patch)
2016-11-15 12:37 PST, Brady Eidson
no flags
Patch (15.76 KB, patch)
2016-11-15 13:34 PST, Brady Eidson
achristensen: review+
Brady Eidson
Comment 1 2016-11-14 17:25:41 PST
Created attachment 294784 [details] WIP (not ready for review) This is almost ready. I need to clean up a few things, but also need to clear up the glaring FIXME: Figure out the magic byte for Glib keyed coding on linux.
Brady Eidson
Comment 2 2016-11-14 19:44:23 PST
(In reply to comment #1) > Created attachment 294784 [details] > WIP (not ready for review) > > This is almost ready. > > I need to clean up a few things, but also need to clear up the glaring > FIXME: Figure out the magic byte for Glib keyed coding on linux. After commuting home and enjoying a refreshing dinner, I of course realize I can just "#if PLATFORM(MAC)" the new stuff and let linux folks come along and get the new hotness working later.
Brady Eidson
Comment 3 2016-11-14 20:55:09 PST
Brady Eidson
Comment 4 2016-11-14 21:52:07 PST
Brady Eidson
Comment 5 2016-11-14 22:10:27 PST
The Windows build error: c:\cygwin\home\buildbot\webkit\source\webcore\modules\indexeddb\server\idbserialization.cpp(166): warning C4715: 'WebCore::serializedTypeForKeyType': not all control paths return a value [C:\cygwin\home\buildbot\WebKit\WebKitBuild\Release\Source\WebCore\WebCore.vcxproj] Except every case *is* handled in that switch. It is true the last one doesn't return, but does RELEASE_ASSERT_NOT_REACHED(). Seems like an MSVC limitation and one I'm not sure how we normally work around...
Brady Eidson
Comment 6 2016-11-14 22:27:53 PST
Some other notes not explicitly called out that I'll add to the changelog. Keys previously encoded with KeyedEncoderCF (binary property lists) will continue to be readable, and be comparable to both old and new style keys. We make no effort to migrate old-style keys, but will only ever write new-style keys.
Brady Eidson
Comment 7 2016-11-15 08:50:07 PST
Alex Christensen
Comment 8 2016-11-15 11:05:16 PST
Comment on attachment 294839 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=294839&action=review > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:148 > + switch (type) { I think MSVC is complaining about the possibility that we get a type that is not in the enum, so we should have a RELEASE_ASSERT_NOT_REACHED and return something after the switch. > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:169 > +#if CPU(BIG_ENDIAN) || CPU(MIDDLE_ENDIAN) || CPU(NEEDS_ALIGNED_ACCESS) Can we not make this endianness-independent? The same machine is going to be encoding and decoding these values, right? If not, can we only have one #if instead of defining something that's used twice for no reason.
Brady Eidson
Comment 9 2016-11-15 12:34:10 PST
(In reply to comment #8) > Comment on attachment 294839 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=294839&action=review > > > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:148 > > + switch (type) { > > I think MSVC is complaining about the possibility that we get a type that is > not in the enum, so we should have a RELEASE_ASSERT_NOT_REACHED and return > something after the switch. It's being different from gcc which is sad. > > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:169 > > +#if CPU(BIG_ENDIAN) || CPU(MIDDLE_ENDIAN) || CPU(NEEDS_ALIGNED_ACCESS) > > Can we not make this endianness-independent? The same machine is going to > be encoding and decoding these values, right? Not necessarily, can move database files between machines for debugging purposes, at the very least. > If not, can we only have one #if instead of defining something that's used > twice for no reason. I guess.
Brady Eidson
Comment 10 2016-11-15 12:36:44 PST
Brady Eidson
Comment 11 2016-11-15 12:37:37 PST
Alex Christensen
Comment 12 2016-11-15 13:10:38 PST
Comment on attachment 294867 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=294867&action=review > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:103 > + Where does this come from? > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:135 > +static const uint8_t SIDBKeyVersion = 0x00; This is far from the comment about it being 0x00. > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:173 > + for (unsigned i = 0; i < sizeof(T); i++) { size_t > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:184 > + if (sizeof(T) == 1) This special case is not necessary, and makes it harder to see the symmetry of this function and the corresponding write function. > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:188 > + for (unsigned i = 0; i < sizeof(T); i++) size_t > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:204 > + if (sizeof(T) == 1) Also not necessary. > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:216 > + union { *reinterpret_cast<uint64_t*>(&d) accomplishes the same thing without all this union stuff. I guess we don't have bit_cast in WebKit yet. > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:289 > + Vector<char> data; I think this should be a Vector<uint8_t> > Source/WebCore/Modules/indexeddb/server/IDBSerialization.h:42 > +int compareSerializedIDBKeyData(const uint8_t* buffer1, size_t buffer1Size, const uint8_t* buffer2, size_t buffer2Size); This doesn't seem to be used.
Brady Eidson
Comment 13 2016-11-15 13:32:51 PST
(In reply to comment #12) > Comment on attachment 294867 [details] > Patch > > > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:289 > > + Vector<char> data; > > I think this should be a Vector<uint8_t> It cannot be because SharedBuffers expect only Vector<char> > > > Source/WebCore/Modules/indexeddb/server/IDBSerialization.h:42 > > +int compareSerializedIDBKeyData(const uint8_t* buffer1, size_t buffer1Size, const uint8_t* buffer2, size_t buffer2Size); > > This doesn't seem to be used. Yah, stashed the def. away for a later day, forgot to remove the decl.
Brady Eidson
Comment 14 2016-11-15 13:34:41 PST
Alex Christensen
Comment 15 2016-11-15 13:39:06 PST
Comment on attachment 294871 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=294871&action=review > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:181 > +template <typename T> static bool readLittleEndian(const uint8_t*& ptr, const uint8_t* end, T& value) Can't this return an Optional<T> instead of a bool and passing a T&?
Brady Eidson
Comment 16 2016-11-15 13:42:08 PST
(In reply to comment #15) > Comment on attachment 294871 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=294871&action=review > > > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:181 > > +template <typename T> static bool readLittleEndian(const uint8_t*& ptr, const uint8_t* end, T& value) > > Can't this return an Optional<T> instead of a bool and passing a T&? Yes, that would technically work. Because of the nature of at least some of the deserializations I don't think it would make the code any better. (Plus we're already resigned to having out parameters on these since the ptr is mutable)
Brady Eidson
Comment 17 2016-11-15 16:25:59 PST
Brady Eidson
Comment 18 2016-11-15 18:41:44 PST
Note You need to log in before you can comment on or make changes to this bug.