Bug 62284

Summary: IndexedDB: implement compound (array) key support
Product: WebKit Reporter: Mark Pilgrim (Google) <pilgrim>
Component: New BugsAssignee: Joshua Bell <jsbell>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, dgrogan, fishd, hans, japhet, jsbell, tony, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
test case
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch for landing none

Description Mark Pilgrim (Google) 2011-06-08 08:35:38 PDT
Created attachment 96426 [details]
test case

Note in http://dvcs.w3.org/hg/IndexedDB/raw-file/tip/Overview.html#key-construct states "Infinite float values are valid keys. As are empty Arrays."

Expected behavior: add a record with an empty array as key
Actual behavior: FAIL request = objectStore.add([], []); threw exception Error: TYPE_MISMATCH_ERR: DOM Exception 17

Layout test attached.
Comment 1 Mark Pilgrim (Google) 2011-06-08 09:00:15 PDT
Turns out we don't accept any kind of array as a key. [1,2] and [[]] should also be valid keys, but they each throw.
Comment 2 Hans Wennborg 2011-06-08 09:08:55 PDT
(In reply to comment #1)
> Turns out we don't accept any kind of array as a key. [1,2] and [[]] should also be valid keys, but they each throw.

Yep, support for arrays as keys is on the TODO list (but not very high on it).
Comment 3 Joshua Bell 2011-10-31 17:20:23 PDT
Created attachment 113111 [details]
Patch
Comment 4 Joshua Bell 2011-10-31 17:22:10 PDT
Needs more tests before a proper review, but if hans@ or dgrogan@ want to take a look I'd appreciate it. Several of the inline array-populating methods should probably be factored out.
Comment 5 WebKit Review Bot 2011-10-31 17:23:32 PDT
Please wait for approval from fishd@chromium.org before submitting because this patch contains changes to the Chromium public API.
Comment 6 Darin Fisher (:fishd, Google) 2011-10-31 21:22:46 PDT
Comment on attachment 113111 [details]
Patch

WebKit API changes LGTM
Comment 7 Hans Wennborg 2011-11-01 10:15:54 PDT
Comment on attachment 113111 [details]
Patch

Good stuff! Very nice.

Some comments below. And you should extend IDBLevelDBCodingTest to cover this.
In fact, if you'd like to split up this patch to make it easier to review and land, that would probably be a natural division point: i.e. put the IDBKey, LevelDBCoding, LevelDBCodingTest in one patch, and the rest in another.

View in context: https://bugs.webkit.org/attachment.cgi?id=113111&action=review

> Source/WebCore/ChangeLog:3
> +        IndexedDB add() should accept array as key

maybe we should rename the bug to talk about compound keys instead

> Source/WebCore/bindings/v8/IDBBindingUtilities.cpp:79
> +                    return subkey;

hmm, why?

> Source/WebCore/storage/IDBKey.cpp:52
> +            if (result)

you could do "if (int result = m_array[i]->compare...)" and save a line :)

> Source/WebCore/storage/IDBKey.cpp:56
> +                (m_array.size() > other->m_array.size()) ? 1 : 0;

hmm, probably easier on the eyes to just do if-elseif-else here

> Source/WebCore/storage/IDBLevelDBCoding.cpp:368
> +            encodeIDBKey(*((*it).get()), into);

I'd probably split *((*it).get()) into two lines: "const RefPtr<IDBKey>& subKey = *it; encodeIdbKey(*subkey, into);" Or maybe you could just use **it?

> Source/WebCore/storage/IDBLevelDBCoding.cpp:481
> +            p = extractEncodedIDBKey(p, limit, &subkey);

ah, nice :)

> Source/WebCore/storage/IDBLevelDBCoding.cpp:537
> +            if (cmp)

again, feel free to save a line with "if (int cmp = ...)" here if you prefer.

> Source/WebCore/storage/IDBLevelDBCoding.cpp:541
> +                (lenA > lenB) ? 1 : 0;

nested ?:s make me scared, i'd just write it out as if-elseif-else.

> Source/WebCore/storage/IDBLevelDBCoding.h:59
> +void encodeIDBKey(const IDBKey&, Vector<char>&);

nit: I'd give a name to the vector parameter here; it's not obvious that it's "into".

> Source/WebKit/chromium/src/WebIDBKey.cpp:133
> +void convertToWebIDBKeyArray(WebVector<WebIDBKey>& result, const IDBKey::KeyArray& array)

i think it's more common to have the "output" parameter last.

> Source/WebKit/chromium/src/WebIDBKey.cpp:222
> +    // FIXME: This is probably sketchy with compound keys

why?
Comment 8 Joshua Bell 2011-11-01 10:35:40 PDT
Comment on attachment 113111 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=113111&action=review

Thanks, all good stuff. I'm working on the chromium/tests and browser_tests next.

>> Source/WebCore/ChangeLog:3
>> +        IndexedDB add() should accept array as key
> 
> maybe we should rename the bug to talk about compound keys instead

Done

>> Source/WebCore/bindings/v8/IDBBindingUtilities.cpp:79
>> +                    return subkey;
> 
> hmm, why?

I didn't want to throw away and re-create an invalid key during each step of the stack unwinding since there's a perfectly serviceable one at hand. I changed the if (!subkey) to ASSERT_NOT_REACHED. 

Alternately, I could avoid creating an invalid key at all in this method, let a NULL propagating up the stack flag invalid, and let the public create() method create an invalid key for the final result. Opinions?

>> Source/WebCore/storage/IDBLevelDBCoding.cpp:481
>> +            p = extractEncodedIDBKey(p, limit, &subkey);
> 
> ah, nice :)

As an aside, I'm slightly concerned about the about of Vector<char> churn (i.e. lots of small allocs/copies/frees) in the encode/decode/extract methods, but I suppose we'll fix that if/when it shows up on a profile.

>> Source/WebKit/chromium/src/WebIDBKey.cpp:222
>> +    // FIXME: This is probably sketchy with compound keys
> 
> why?

Early misunderstanding about how WebIDBKeys related to WebCore::IDBKeys; removed these notes-to-self.
Comment 9 Joshua Bell 2011-11-02 13:10:39 PDT
Created attachment 113357 [details]
Patch
Comment 10 Joshua Bell 2011-11-02 17:59:41 PDT
Created attachment 113412 [details]
Patch
Comment 11 Joshua Bell 2011-11-02 18:01:44 PDT
I think this is "done", review at leisure - I have corresponding chromium-side browser_tests ready to go as well.

Better name suggestions for the "codingToOrder" function in IDBLevelDBCoding.cpp appreciated
Comment 12 Joshua Bell 2011-11-02 22:07:48 PDT
Created attachment 113433 [details]
Patch
Comment 13 Joshua Bell 2011-11-02 22:08:44 PDT
Comment on attachment 113433 [details]
Patch

Added one more layout test which validates cursor result ordering and that IDBCursor.key yields correct values for array-type keys.
Comment 14 Hans Wennborg 2011-11-03 09:38:00 PDT
Comment on attachment 113433 [details]
Patch

LGTM with nits


View in context: https://bugs.webkit.org/attachment.cgi?id=113433&action=review

> Source/WebCore/storage/IDBLevelDBCoding.cpp:520
>  {

Or maybe KeyTypeByteToKeyType?

> Source/WebCore/storage/IDBLevelDBCoding.cpp:549
> +    if (int x = codingToOrder(typeB) - codingToOrder(typeA))

Maybe IDBKey should have a "static int compareTypes(IDBKey::Type a, IDBKey::Type b)" that does this.. as it is now, one needs to know what the enum inside IDBKey looks like to see why this subtraction does the right thing.

> Source/WebKit/chromium/src/WebIDBKey.cpp:96
> +PassRefPtr<IDBKey> convertFromWebIDBKeyArray(const WebVector<WebIDBKey>& array)

should this be static?

> Source/WebKit/chromium/src/WebIDBKey.cpp:122
> +void convertToWebIDBKeyArray(const IDBKey::KeyArray& array, WebVector<WebIDBKey>& result)

should this be static?

> Source/WebKit/chromium/src/WebIDBKey.cpp:124
> +    WebVector<WebIDBKey> keys(array.size()), subkeys;

I'd probably declare them on one line each

> LayoutTests/storage/indexeddb/cursor-key-order.html:31
> +    evalAndLog("openreq = indexedDB.open('db')");

we normally use the name of the layout test to name the database
Comment 15 Joshua Bell 2011-11-03 13:43:29 PDT
Created attachment 113551 [details]
Patch
Comment 16 Hans Wennborg 2011-11-04 04:18:14 PDT
(In reply to comment #15)
> Created an attachment (id=113551) [details]
> Patch

LGTM
Comment 17 Tony Chang 2011-11-04 10:59:44 PDT
Comment on attachment 113551 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=113551&action=review

> Source/WebCore/bindings/v8/IDBBindingUtilities.cpp:59
> +    PassRefPtr<IDBKey> createInternal(v8::Handle<v8::Value> value)
> +    {
> +        if (value->IsNumber() && !isnan(value->NumberValue()))

What's the benefit of having a KeyCreator class?  Couldn't it just be a static function?

> Source/WebCore/bindings/v8/IDBBindingUtilities.cpp:74
> +            for (uint32_t i = 0; i < length; i++) {

Nit: ++i

> Source/WebCore/bindings/v8/IDBBindingUtilities.cpp:91
> +    Vector<v8::Handle<v8::Array> > m_stack;

Why is this a member variable?

> Source/WebCore/bindings/v8/custom/V8IDBKeyCustom.cpp:63
> +    case IDBKey::ArrayType:
> +        array = v8::Array::New(key->array().size());
> +        for (size_t i = 0; i < key->array().size(); ++i)
> +            array->Set(i, toV8(key->array()[i].get()));
> +        return array;

Nit: I would declare array in the case block.  You'll need to use {} to scope the variable declartion.

> Source/WebCore/storage/IDBLevelDBCoding.cpp:305
> +int compareEncodedStringsWithLength(const char*& p, const char* limitP, const char*& q, const char* limitQ)

The pass by reference here seems unnecessary.  It looks like the changed values are only used in the unit test.  Can you instead ASSERT in this function that p + lenP*2 == limitP?

> Source/WebCore/storage/IDBLevelDBCoding.cpp:365
> +    size_t prevSize = into.size();
> +    size_t len;

Nit: Avoid abbreviations for variable names.  prevSize -> previousSize and len -> length.

> Source/WebCore/storage/IDBLevelDBCoding.cpp:379
> +    case IDBKey::ArrayType:
> +        into.append(encodeByte(kIDBKeyArrayTypeByte));
> +        len = key.array().size();
> +        into.append(encodeVarInt(len));
> +        for (size_t i = 0; i < len; ++i)
> +            encodeIDBKey(*key.array()[i], into);
> +        ASSERT_UNUSED(prevSize, into.size() > prevSize);
> +        return;

I would also put this in {} and scope the declaration of length.

> Source/WebCore/storage/IDBLevelDBCoding.cpp:410
> +    int64_t len;

Nit: len -> length

> Source/WebCore/storage/IDBLevelDBCoding.cpp:420
> +    case kIDBKeyArrayTypeByte:

{} and scoped variables

> Source/WebCore/storage/IDBLevelDBCoding.cpp:421
> +        // Array.

I would remove this comment.

> Source/WebCore/storage/IDBLevelDBCoding.cpp:471
> +    int64_t len;

len -> length

> Source/WebCore/storage/IDBLevelDBCoding.cpp:480
> +    case kIDBKeyArrayTypeByte:
> +        // Array.

Same comments as above.

> Source/WebCore/storage/IDBLevelDBCoding.cpp:498
>      case kIDBKeyStringTypeByte:
>          // String.
> -        p = decodeVarInt(p, limit, stringLen);
> +        p = decodeVarInt(p, limit, len);

I would scope stringLength and get rid of the comment.

> Source/WebCore/storage/IDBLevelDBCoding.cpp:534
> +    default:

No default: on switches so the compiler will give an error if you forget a case.  You'll need to move the ASSERT_NOT_REACHED and return InvalidType outside the switch to make some versions of gcc happy.

> Source/WebCore/storage/IDBLevelDBCoding.cpp:547
> +    int64_t lenA, lenB;

lenA -> lengthA, lenB -> lengthB.

> Source/WebCore/storage/IDBLevelDBCoding.cpp:558
> +    case kIDBKeyArrayTypeByte:
> +        // Array type.

Scoped local variables, no comment.

> Source/WebCore/storage/IDBLevelDBCoding.cpp:566
> +            if (int cmp = compareEncodedIDBKeys(p, limitA, q, limitB))

Is it possible to nest arrays in arrays?  If so, is there a limit?  Otherwise I think you can overflow the stack (too much recursion) if you nest lots of arrays.

> Source/WebKit/chromium/src/WebIDBKey.cpp:116
> +        default:
> +            ASSERT_NOT_REACHED();
> +            break;

No default:

> Source/WebKit/chromium/src/WebIDBKey.cpp:144
> +        default:
> +            ASSERT_NOT_REACHED();
> +            break;

No default:
Comment 18 Joshua Bell 2011-11-04 11:13:16 PDT
Comment on attachment 113551 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=113551&action=review

All the nits/style feedback is awesome, thanks.

>> Source/WebCore/bindings/v8/IDBBindingUtilities.cpp:59
>> +        if (value->IsNumber() && !isnan(value->NumberValue()))
> 
> What's the benefit of having a KeyCreator class?  Couldn't it just be a static function?

This class exists to support cycle detection, i.e. ensuring that arrays don't contain themselves. A class is used to contain the m_stack state, which tracks parent arrays during the recursion.

If we want to impose a depth limit on nested arrays we'd do so here (if m_stack.size() > SOME_LIMIT ... return Invalid) since this is where script data enters the IDBKey system.

> Source/WebCore/bindings/v8/IDBBindingUtilities.cpp:68
> +            if (m_stack.contains(array))

^^^^ That's the the bit where we prevent cycles.

>> Source/WebCore/bindings/v8/custom/V8IDBKeyCustom.cpp:63
>> +        return array;
> 
> Nit: I would declare array in the case block.  You'll need to use {} to scope the variable declartion.

Excellent, I much prefer that style, I just didn't see it used in nearby examples or the style guide.

>> Source/WebCore/storage/IDBLevelDBCoding.cpp:305
>> +int compareEncodedStringsWithLength(const char*& p, const char* limitP, const char*& q, const char* limitQ)
> 
> The pass by reference here seems unnecessary.  It looks like the changed values are only used in the unit test.  Can you instead ASSERT in this function that p + lenP*2 == limitP?

They're used when decoding arrays; the string may exist within a larger buffer containing several concatenated strings (or other values). That assertion would be invalid, as limitP is the end of the overall buffer. 

(The inputs to this function end up being two start and two end pointers; the outputs are the comparison result plus the two updated pointers, for subsequent decoding/comparison operations.)

>> Source/WebCore/storage/IDBLevelDBCoding.cpp:566

> 
> Is it possible to nest arrays in arrays?  If so, is there a limit?  Otherwise I think you can overflow the stack (too much recursion) if you nest lots of arrays.

See above re: nesting and limits. Suggestions on a limit? I can borrow from what SerializedScriptValue uses. (SSV's explicitly support cycles but do have depth limits)
Comment 19 Tony Chang 2011-11-04 13:13:19 PDT
Comment on attachment 113551 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=113551&action=review

>>> Source/WebCore/bindings/v8/IDBBindingUtilities.cpp:59
>>> +        if (value->IsNumber() && !isnan(value->NumberValue()))
>> 
>> What's the benefit of having a KeyCreator class?  Couldn't it just be a static function?
> 
> This class exists to support cycle detection, i.e. ensuring that arrays don't contain themselves. A class is used to contain the m_stack state, which tracks parent arrays during the recursion.
> 
> If we want to impose a depth limit on nested arrays we'd do so here (if m_stack.size() > SOME_LIMIT ... return Invalid) since this is where script data enters the IDBKey system.

Can we pass m_stack in as a parameter (pass-by-reference) to createInteral?  You could either make 2 static functions or keep the class and have the stack created in the create() method above.

As for a limit, it looks like SerializedScriptValue has a limit of 40,000 in JSC and 20,000 in V8.  I'm not sure what's appropriate for you, but 20,000 is probably plenty.  Make sure to add this case to your tests.

>>> Source/WebCore/storage/IDBLevelDBCoding.cpp:305
>>> +int compareEncodedStringsWithLength(const char*& p, const char* limitP, const char*& q, const char* limitQ)
>> 
>> The pass by reference here seems unnecessary.  It looks like the changed values are only used in the unit test.  Can you instead ASSERT in this function that p + lenP*2 == limitP?
> 
> They're used when decoding arrays; the string may exist within a larger buffer containing several concatenated strings (or other values). That assertion would be invalid, as limitP is the end of the overall buffer. 
> 
> (The inputs to this function end up being two start and two end pointers; the outputs are the comparison result plus the two updated pointers, for subsequent decoding/comparison operations.)

OK, that makes sense.  It makes me a bit nervous because it looks like in release builds, if an assert fails, we'll happily start pointing into random memory.  Should we be more cautious here or do we fully control all the serializing and unserializing code?  Do we have to worry about files being corrupted on disk?
Comment 20 Joshua Bell 2011-11-04 15:20:15 PDT
(In reply to comment #19)
> Can we pass m_stack in as a parameter (pass-by-reference) to createInteral?  You could either make 2 static functions or keep the class and have the stack created in the create() method above.

Ah, much cleaner, will do.
 
> As for a limit, it looks like SerializedScriptValue has a limit of 40,000 in JSC and 20,000 in V8.  I'm not sure what's appropriate for you, but 20,000 is probably plenty.  Make sure to add this case to your tests.

Agreed.
 
> > (The inputs to this function end up being two start and two end pointers; the outputs are the comparison result plus the two updated pointers, for subsequent decoding/comparison operations.)
> 
> OK, that makes sense.  It makes me a bit nervous because it looks like in release builds, if an assert fails, we'll happily start pointing into random memory.  Should we be more cautious here or do we fully control all the serializing and unserializing code?  Do we have to worry about files being corrupted on disk?

We control the code but agree we should be more paranoid. I've added an additional runtime check that p and q haven't gone past their limits, and bail if they have. This matches what the compareEncodedXXX() logic does if e.g. decodeVarInt wanders past the limit.
Comment 21 Joshua Bell 2011-11-04 16:19:48 PDT
Created attachment 113727 [details]
Patch
Comment 22 Tony Chang 2011-11-04 16:43:36 PDT
Comment on attachment 113727 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=113727&action=review

> LayoutTests/storage/indexeddb/key-type-array.html:207
> +    evalAndExpectException("indexedDB.cmp(makeArrayOfDepth(100000), 0)", "IDBDatabaseException.DATA_ERR");

You may want to double check that this runs quickly in debug, but I think you'll be fine (looks like in release it's only like 50ms or so to create the structure).
Comment 23 Joshua Bell 2011-11-04 16:48:07 PDT
Comment on attachment 113727 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=113727&action=review

>> LayoutTests/storage/indexeddb/key-type-array.html:207
>> +    evalAndExpectException("indexedDB.cmp(makeArrayOfDepth(100000), 0)", "IDBDatabaseException.DATA_ERR");
> 
> You may want to double check that this runs quickly in debug, but I think you'll be fine (looks like in release it's only like 50ms or so to create the structure).

Definitely slows down the test... looks like it takes ~650ms in Debug on my box. I'll drop the number to 50k (~350ms) which will still trip the depth check.
Comment 24 Joshua Bell 2011-11-08 11:59:20 PST
Created attachment 114128 [details]
Patch for landing
Comment 25 WebKit Review Bot 2011-11-08 12:00:06 PST
Comment on attachment 114128 [details]
Patch for landing

Rejecting attachment 114128 [details] from commit-queue.

jsbell@chromium.org does not have committer permissions according to http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/common/config/committers.py.

- If you do not have committer rights please read http://webkit.org/coding/contributing.html for instructions on how to use bugzilla flags.

- If you have committer rights please correct the error in Tools/Scripts/webkitpy/common/config/committers.py by adding yourself to the file (no review needed).  The commit-queue restarts itself every 2 hours.  After restart the commit-queue will correctly respect your committer rights.
Comment 26 Joshua Bell 2011-11-08 12:01:51 PST
Comment on attachment 114128 [details]
Patch for landing

drops the max depth to 50k

@tony - one more look?

(dgrogan and I tried "land-safely" to preserve the r+ but that does not appear to have done what we'd hoped)
Comment 27 Tony Chang 2011-11-08 12:21:55 PST
Comment on attachment 114128 [details]
Patch for landing

Note that the new patch has "Reviewed by Tony Chang." in it, so it doesn't need to be reviewed again.

That's really all land-safely does (change the OOPS to the reviewer name).
Comment 28 WebKit Review Bot 2011-11-08 12:51:33 PST
Comment on attachment 114128 [details]
Patch for landing

Clearing flags on attachment: 114128

Committed r99609: <http://trac.webkit.org/changeset/99609>
Comment 29 WebKit Review Bot 2011-11-08 12:51:40 PST
All reviewed patches have been landed.  Closing bug.