Bug 164754 - IndexedDB 2.0: Key collation during SQLite lookups is insanely slow
Summary: IndexedDB 2.0: Key collation during SQLite lookups is insanely slow
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Brady Eidson
URL:
Keywords: InRadar
Depends on:
Blocks: 160306
  Show dependency treegraph
 
Reported: 2016-11-14 17:19 PST by Brady Eidson
Modified: 2016-11-15 18:41 PST (History)
6 users (show)

See Also:


Attachments
WIP (not ready for review) (11.29 KB, patch)
2016-11-14 17:25 PST, Brady Eidson
no flags Details | Formatted Diff | Diff
Patch (16.36 KB, patch)
2016-11-14 20:55 PST, Brady Eidson
no flags Details | Formatted Diff | Diff
Patch (16.64 KB, patch)
2016-11-14 21:52 PST, Brady Eidson
no flags Details | Formatted Diff | Diff
Patch (16.69 KB, patch)
2016-11-15 08:50 PST, Brady Eidson
no flags Details | Formatted Diff | Diff
Patch (17.48 KB, patch)
2016-11-15 12:36 PST, Brady Eidson
no flags Details | Formatted Diff | Diff
Patch (16.55 KB, patch)
2016-11-15 12:37 PST, Brady Eidson
no flags Details | Formatted Diff | Diff
Patch (15.76 KB, patch)
2016-11-15 13:34 PST, Brady Eidson
achristensen: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brady Eidson 2016-11-14 17:19:31 PST
IndexedDB 2.0: Key collation during SQLite lookups is insanely slow

This is what causes us to fail imported/w3c/web-platform-tests/IndexedDB/idbindex-multientry-big.htm due to timing out.

Looking at profiles one can confirm that a vast majority of the time is spent under WebCore::idbKeyCollate, and most of that time is spent allocating, using, and deallocating both Strings and CFPropertyLists.

Using KeyedEncoder/Decoder for IDBKeys on disk was probably a bad idea; Common keys (e.g. a number) are *MUCH* larger than they need to be, and all keys require a lot of memory churn and CPU time just for SQLite's comparison sake.

If we move to a custom serialization format targeted at IDBKeys we can vastly improve this.

That last statement is not a hypothesis; The afore mentioned idbindex-multientry-big.htm takes ~26 seconds to complete with timeouts disabled today, but with a custom format it takes less than 2 seconds.
And profiling confirms that whereas collation is ~97% of the 26 seconds, it is less than 1.4% of the ~2 seconds.
Comment 1 Brady Eidson 2016-11-14 17:25:41 PST
Created attachment 294784 [details]
WIP (not ready for review)

This is almost ready.

I need to clean up a few things, but also need to clear up the glaring FIXME: Figure out the magic byte for Glib keyed coding on linux.
Comment 2 Brady Eidson 2016-11-14 19:44:23 PST
(In reply to comment #1)
> Created attachment 294784 [details]
> WIP (not ready for review)
> 
> This is almost ready.
> 
> I need to clean up a few things, but also need to clear up the glaring
> FIXME: Figure out the magic byte for Glib keyed coding on linux.

After commuting home and enjoying a refreshing dinner, I of course realize I can just "#if PLATFORM(MAC)" the new stuff and let linux folks come along and get the new hotness working later.
Comment 3 Brady Eidson 2016-11-14 20:55:09 PST
Created attachment 294803 [details]
Patch
Comment 4 Brady Eidson 2016-11-14 21:52:07 PST
Created attachment 294812 [details]
Patch
Comment 5 Brady Eidson 2016-11-14 22:10:27 PST
The Windows build error:

c:\cygwin\home\buildbot\webkit\source\webcore\modules\indexeddb\server\idbserialization.cpp(166): warning C4715: 'WebCore::serializedTypeForKeyType': not all control paths return a value [C:\cygwin\home\buildbot\WebKit\WebKitBuild\Release\Source\WebCore\WebCore.vcxproj]

Except every case *is* handled in that switch. It is true the last one doesn't return, but does RELEASE_ASSERT_NOT_REACHED().

Seems like an MSVC limitation and one I'm not sure how we normally work around...
Comment 6 Brady Eidson 2016-11-14 22:27:53 PST
Some other notes not explicitly called out that I'll add to the changelog.

Keys previously encoded with KeyedEncoderCF (binary property lists) will continue to be readable, and be comparable to both old and new style keys.

We make no effort to migrate old-style keys, but will only ever write new-style keys.
Comment 7 Brady Eidson 2016-11-15 08:50:07 PST
Created attachment 294839 [details]
Patch
Comment 8 Alex Christensen 2016-11-15 11:05:16 PST
Comment on attachment 294839 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=294839&action=review

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:148
> +    switch (type) {

I think MSVC is complaining about the possibility that we get a type that is not in the enum, so we should have a RELEASE_ASSERT_NOT_REACHED and return something after the switch.

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:169
> +#if CPU(BIG_ENDIAN) || CPU(MIDDLE_ENDIAN) || CPU(NEEDS_ALIGNED_ACCESS)

Can we not make this endianness-independent?  The same machine is going to be encoding and decoding these values, right?
If not, can we only have one #if instead of defining something that's used twice for no reason.
Comment 9 Brady Eidson 2016-11-15 12:34:10 PST
(In reply to comment #8)
> Comment on attachment 294839 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=294839&action=review
> 
> > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:148
> > +    switch (type) {
> 
> I think MSVC is complaining about the possibility that we get a type that is
> not in the enum, so we should have a RELEASE_ASSERT_NOT_REACHED and return
> something after the switch.

It's being different from gcc which is sad.

> > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:169
> > +#if CPU(BIG_ENDIAN) || CPU(MIDDLE_ENDIAN) || CPU(NEEDS_ALIGNED_ACCESS)
> 
> Can we not make this endianness-independent?  The same machine is going to
> be encoding and decoding these values, right?

Not necessarily, can move database files between machines for debugging purposes, at the very least.

> If not, can we only have one #if instead of defining something that's used
> twice for no reason.

I guess.
Comment 10 Brady Eidson 2016-11-15 12:36:44 PST
Created attachment 294865 [details]
Patch
Comment 11 Brady Eidson 2016-11-15 12:37:37 PST
Created attachment 294867 [details]
Patch
Comment 12 Alex Christensen 2016-11-15 13:10:38 PST
Comment on attachment 294867 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=294867&action=review

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:103
> +

Where does this come from?

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:135
> +static const uint8_t SIDBKeyVersion = 0x00;

This is far from the comment about it being 0x00.

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:173
> +    for (unsigned i = 0; i < sizeof(T); i++) {

size_t

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:184
> +    if (sizeof(T) == 1)

This special case is not necessary, and makes it harder to see the symmetry of this function and the corresponding write function.

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:188
> +        for (unsigned i = 0; i < sizeof(T); i++)

size_t

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:204
> +    if (sizeof(T) == 1)

Also not necessary.

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:216
> +    union {

*reinterpret_cast<uint64_t*>(&d) accomplishes the same thing without all this union stuff.  I guess we don't have bit_cast in WebKit yet.

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:289
> +    Vector<char> data;

I think this should be a Vector<uint8_t>

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.h:42
> +int compareSerializedIDBKeyData(const uint8_t* buffer1, size_t buffer1Size, const uint8_t* buffer2, size_t buffer2Size);

This doesn't seem to be used.
Comment 13 Brady Eidson 2016-11-15 13:32:51 PST
(In reply to comment #12)
> Comment on attachment 294867 [details]
> Patch

> 
> > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:289
> > +    Vector<char> data;
> 
> I think this should be a Vector<uint8_t>

It cannot be because SharedBuffers expect only Vector<char>

> 
> > Source/WebCore/Modules/indexeddb/server/IDBSerialization.h:42
> > +int compareSerializedIDBKeyData(const uint8_t* buffer1, size_t buffer1Size, const uint8_t* buffer2, size_t buffer2Size);
> 
> This doesn't seem to be used.

Yah, stashed the def. away for a later day, forgot to remove the decl.
Comment 14 Brady Eidson 2016-11-15 13:34:41 PST
Created attachment 294871 [details]
Patch
Comment 15 Alex Christensen 2016-11-15 13:39:06 PST
Comment on attachment 294871 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=294871&action=review

> Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:181
> +template <typename T> static bool readLittleEndian(const uint8_t*& ptr, const uint8_t* end, T& value)

Can't this return an Optional<T> instead of a bool and passing a T&?
Comment 16 Brady Eidson 2016-11-15 13:42:08 PST
(In reply to comment #15)
> Comment on attachment 294871 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=294871&action=review
> 
> > Source/WebCore/Modules/indexeddb/server/IDBSerialization.cpp:181
> > +template <typename T> static bool readLittleEndian(const uint8_t*& ptr, const uint8_t* end, T& value)
> 
> Can't this return an Optional<T> instead of a bool and passing a T&?

Yes, that would technically work.

Because of the nature of at least some of the deserializations I don't think it would make the code any better.

(Plus we're already resigned to having out parameters on these since the ptr is mutable)
Comment 17 Brady Eidson 2016-11-15 16:25:59 PST
https://trac.webkit.org/changeset/208771
Comment 18 Brady Eidson 2016-11-15 18:41:44 PST
<rdar://problem/29231087>