Bug 104354 - IndexedDB: Don't use strings to represent serialized values
Summary: IndexedDB: Don't use strings to represent serialized values
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 105186
  Show dependency treegraph
 
Reported: 2012-12-07 01:16 PST by Michael Pruett
Modified: 2012-12-17 16:00 PST (History)
6 users (show)

See Also:


Attachments
Patch (20.32 KB, patch)
2012-12-07 01:32 PST, Michael Pruett
no flags Details | Formatted Diff | Diff
Patch (20.32 KB, patch)
2012-12-07 01:38 PST, Michael Pruett
buildbot: commit-queue-
Details | Formatted Diff | Diff
Patch (20.60 KB, patch)
2012-12-11 15:27 PST, Michael Pruett
no flags Details | Formatted Diff | Diff
Patch (19.29 KB, patch)
2012-12-11 16:36 PST, Michael Pruett
haraken: review-
Details | Formatted Diff | Diff
Patch (22.19 KB, patch)
2012-12-15 07:44 PST, Michael Pruett
no flags Details | Formatted Diff | Diff
Patch (23.08 KB, patch)
2012-12-16 18:21 PST, Michael Pruett
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Pruett 2012-12-07 01:16:44 PST
Strings are inappropriate for representing serialized values in IndexedDB; these serialized values are not Unicode character sequences and should instead be represented as byte arrays. While the V8 implementation of SerializedScriptValue uses String as its internal representation, the JSC implementation uses Vector<uint8_t>. Converting a Vector<uint8_t> to a String object only for IndexedDB to convert it back immediately to a Vector<char> is awkward and inefficient.
Comment 1 Michael Pruett 2012-12-07 01:32:41 PST
Created attachment 178182 [details]
Patch
Comment 2 WebKit Review Bot 2012-12-07 01:35:41 PST
Attachment 178182 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1
Source/WebCore/bindings/v8/SerializedScriptValue.cpp:2257:  Extra space before ( in function call  [whitespace/parens] [4]
Total errors found: 1 in 12 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Michael Pruett 2012-12-07 01:38:13 PST
Created attachment 178184 [details]
Patch
Comment 4 Alec Flett 2012-12-07 09:45:09 PST
Comment on attachment 178184 [details]
Patch

Overall this is great - I was just saying to jsbell that this is an important one. (Note to self: we have to fix the V8 implementation to stop using String internally as well)

View in context: https://bugs.webkit.org/attachment.cgi?id=178184&action=review

> Source/WebCore/Modules/indexeddb/IDBLevelDBCoding.h:74
> +Vector<char> encodeIDBValue(const Vector<uint8_t>&);

I'm not sure I see why we need encodeIDBValue/decodeIDBValue now - seems like they're just introducing additional string copies to do conversion from char <-> uint8_t - is there a way to cleverly use static_cast and the like to make the conversion safe?

Alternatively, can we just fix the LevelDB code to use uint8_t?

> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1844
>  String SerializedScriptValue::toWireString() const

add FIXME's to say "remove when..." for whatever is stopping this method (preferably in the form of a webkit bug) from being completely removed.. is it bindings code? Chromium code?

(and add the identical FIXME for all the methods that should be removed - I see a bunch)
Comment 5 Build Bot 2012-12-07 10:34:15 PST
Comment on attachment 178184 [details]
Patch

Attachment 178184 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/15186463
Comment 6 Michael Pruett 2012-12-11 15:27:22 PST
Created attachment 178894 [details]
Patch

With the changes in r137346, the toWireString() and createFromWire(const String&) methods of the JSC SerializedScriptValue class can be removed in this patch.
Comment 7 Alec Flett 2012-12-11 15:54:22 PST
Comment on attachment 178894 [details]
Patch

Can you include changes to LevelDBSlice() to accept a Vector<uint8_t> to avoid the extra copy that this patch introduces?
Comment 8 Alec Flett 2012-12-11 15:55:12 PST
(I think you'll have to use reinterpret_cast)
Comment 9 Joshua Bell 2012-12-11 15:58:10 PST
Comment on attachment 178894 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=178894&action=review

> Source/WebCore/Modules/indexeddb/IDBBackingStore.h:81
> +    virtual bool putRecord(IDBBackingStore::Transaction*, int64_t databaseId, int64_t objectStoreId, const IDBKey&, const Vector<uint8_t>& value, RecordIdentifier*);

Should still have WARN_UNUSED_RETURN?
Comment 10 Michael Pruett 2012-12-11 16:36:41 PST
Created attachment 178915 [details]
Patch

I've added back WARN_UNUSED_RETURN which I erroneously removed from IDBBackingStore::putRecord(). I have also eliminated encodeIDBValue() and decodeIDBValue() and have replaced those with calls to Vector::appendVector() and Vector::appendRange(). I've also updated IDBBackingStore::Cursor::value() to return a const Vector<uint8_t>& rather than Vector<uint8_t>. I believe these changes should eliminate the extra memory copies introduced in my previous patch.
Comment 11 Alec Flett 2012-12-12 13:35:12 PST
Comment on attachment 178915 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=178915&action=review

> Source/WebCore/bindings/js/SerializedScriptValue.h:110
> +        return adopt(bytes);

I'm confused how/why this works - I'm not familiar with the JSC SSV implementation, but aren't you adopting a stack-allocated Vector? or maybe can you just adopt data instead? 

(ah.. it looks like the constructor is relying on swap(), but maybe it should just be copying it in SerializedScriptValue::SerializedScriptValue(Vector<uint8_t>& buffer) instead of swapping?

> Source/WebCore/bindings/v8/SerializedScriptValue.cpp:2262
> +        dst[i] = ntohs(src[i]);

I'm actually now a little confused as to why the ntohs is necessary... This may be the most prudent fix for right now, but could this also be fixed by the V8 SSV parser/serializer simply unpacking/packing bytes in the right order? (it seems unfortunate that the act of creating or extracting an SSV requires this conversion, and that that cost should be paid by the parser/serializer in case the SSV is never deserialized)

> Source/WebCore/bindings/v8/SerializedScriptValue.cpp:2324
> +        dst[i] = htons(src[i]);

(ditto for htons here)
Comment 12 Michael Pruett 2012-12-12 14:10:58 PST
(In reply to comment #11)
> (From update of attachment 178915 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=178915&action=review
> 
> > Source/WebCore/bindings/js/SerializedScriptValue.h:110
> > +        return adopt(bytes);
> 
> I'm confused how/why this works - I'm not familiar with the JSC SSV implementation, but aren't you adopting a stack-allocated Vector? or maybe can you just adopt data instead? 
> 
> (ah.. it looks like the constructor is relying on swap(), but maybe it should just be copying it in SerializedScriptValue::SerializedScriptValue(Vector<uint8_t>& buffer) instead of swapping?

I agree that adding a constructor which takes a const Vector<uint8_t>& would be cleaner.

> > Source/WebCore/bindings/v8/SerializedScriptValue.cpp:2262
> > +        dst[i] = ntohs(src[i]);
> 
> I'm actually now a little confused as to why the ntohs is necessary... This may be the most prudent fix for right now, but could this also be fixed by the V8 SSV parser/serializer simply unpacking/packing bytes in the right order? (it seems unfortunate that the act of creating or extracting an SSV requires this conversion, and that that cost should be paid by the parser/serializer in case the SSV is never deserialized)
> 
> > Source/WebCore/bindings/v8/SerializedScriptValue.cpp:2324
> > +        dst[i] = htons(src[i]);
> 
> (ditto for htons here)

I have added byte swapping to the V8 implementation of SSV's toWireBytes() and createFromWire(const Vector<uint8_t>&) in order to maintain binary compatibility with existing IDB databases. In the previous call path, the string in toWireString() and createFromWire(const String&) was swapped in IDBLevelDBCoding.h's encodeString() and decodeString().

Certainly swapping for deserialization could be performed at a later stage such as in SerializedScriptValue::deserialize() if performance is a concern, but that approach would be significantly more complex. I believe the approach in the proposed patch has equivalent performance to the status quo as every value which was written to or read from a LevelDB database was swapped.
Comment 13 Alec Flett 2012-12-12 14:35:57 PST
> Certainly swapping for deserialization could be performed at a later stage such as in SerializedScriptValue::deserialize() if performance is a concern, but that approach would be significantly more complex. I believe the approach in the proposed patch has equivalent performance to the status quo as every value which was written to or read from a LevelDB database was swapped.

Yeah, that's kind of what I figured - I just wanted to confirm. The parser is a total mess right now anyway (it manually puts the value into a String, pulls out 16-bit characters one at a time, and then parses them as a sequence of 8-bit characters. so no sense trying to make that better right now.
Comment 14 Kentaro Hara 2012-12-12 23:29:36 PST
Comment on attachment 178915 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=178915&action=review

BTW, don't you have any performance concern about this change? (Is there any benchmark that we can use to make sure that this change won't regress performance?)

> Source/WebCore/Modules/indexeddb/IDBBackingStore.cpp:1406
> +    Vector<uint8_t> m_emptyValue;

Do you need this member?

> Source/WebCore/Modules/indexeddb/IDBBackingStore.cpp:1528
> +    Vector<uint8_t> m_emptyValue;

Ditto.

> Source/WebCore/Modules/indexeddb/IDBBackingStore.cpp:1610
> +    Vector<uint8_t> m_value;

Nit: Rename to m_currentValue, as you're using m_currentValue above.
Comment 15 Kentaro Hara 2012-12-12 23:32:03 PST
Comment on attachment 178915 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=178915&action=review

(It looks like a couple of my comments were gone away...)

> Source/WebCore/bindings/js/SerializedScriptValue.h:107
> +    static PassRefPtr<SerializedScriptValue> createFromWire(const Vector<uint8_t>& data)

It's a bit confusing that there are two createFromWire()s. Shall we rename the method?

> Source/WebCore/bindings/js/SerializedScriptValue.h:112
> +    const Vector<uint8_t>& toWireBytes() const { return m_data; }

Is this correct? You're implementing toWireBytes() in the cpp file too.

> Source/WebCore/bindings/v8/SerializedScriptValue.cpp:2326
> +    return result;

Can we avoid copying the vector ?
Comment 16 Michael Pruett 2012-12-15 07:41:42 PST
(In reply to comment #15)
> (From update of attachment 178915 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=178915&action=review
> 
> (It looks like a couple of my comments were gone away...)
> 
> > Source/WebCore/bindings/js/SerializedScriptValue.h:107
> > +    static PassRefPtr<SerializedScriptValue> createFromWire(const Vector<uint8_t>& data)
> 
> It's a bit confusing that there are two createFromWire()s. Shall we rename the method?

Sure, I'll rename it to createFromWireBytes().

> > Source/WebCore/bindings/js/SerializedScriptValue.h:112
> > +    const Vector<uint8_t>& toWireBytes() const { return m_data; }
> 
> Is this correct? You're implementing toWireBytes() in the cpp file too.

Yes I believe this is correct. The toWireBytes() method is defined in the header file for the JSC SerializedScriptValue but in the source file for the V8 SerializedScriptValue.

> > Source/WebCore/bindings/v8/SerializedScriptValue.cpp:2326
> > +    return result;
> 
> Can we avoid copying the vector ?

I don't think so, but the copying that is now happening here used to happen in encodeString(), so I don't think there's any change.
Comment 17 Michael Pruett 2012-12-15 07:44:47 PST
Created attachment 179599 [details]
Patch
Comment 18 Kentaro Hara 2012-12-16 16:38:27 PST
Comment on attachment 179599 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=179599&action=review

> Source/WebCore/Modules/indexeddb/IDBBackingStore.cpp:1392
> +    virtual const Vector<uint8_t>* value() const { ASSERT_NOT_REACHED(); return 0; }

I would prefer returning 'const Vector<uint8_t>&' because m_currentValue is a member variable. Here you can write:

  virtual const Vector<uint8_t>& value() const { ASSERT_NOT_REACHED(); return new Vector<uint8_t>(); }

Performance of this code wouldn't be important as this is a code that must not be reached.

> Source/WebCore/Modules/indexeddb/IDBBackingStore.cpp:1447
> +    virtual const Vector<uint8_t>* value() const { return &m_currentValue; }

Ditto.

> Source/WebCore/Modules/indexeddb/IDBBackingStore.cpp:1508
> +    virtual const Vector<uint8_t>* value() const { ASSERT_NOT_REACHED(); return 0; }

Ditto.

> Source/WebCore/Modules/indexeddb/IDBBackingStore.cpp:1587
> +    virtual const Vector<uint8_t>* value() const { return &m_currentValue; }

Ditto.

> Source/WebCore/Modules/indexeddb/IDBBackingStore.h:118
> +        virtual const Vector<uint8_t>* value() const = 0;

Ditto.

> Source/WebCore/bindings/js/SerializedScriptValue.h:110
> +        Vector<uint8_t> bytes(data);
> +        return adopt(bytes);

Let's apply the comment from Alec Flett.

> Source/WebCore/bindings/v8/SerializedScriptValue.cpp:2262
> +        dst[i] = ntohs(src[i]);

Alec Flett: Does this ntohs() looks OK ?
Comment 19 Michael Pruett 2012-12-16 18:21:05 PST
Created attachment 179678 [details]
Patch

I've changed the return type of IDBBackingStore::Cursor::value() from const Vector<uint8_t>* to const Vector<uint8_t>& and have added a constructor for JSC SerializedScriptValue which takes const Vector<uint8_t>& rather than Vector<uint8_t>&.
Comment 20 Kentaro Hara 2012-12-16 18:23:09 PST
Comment on attachment 179678 [details]
Patch

LGTM. Let's wait for alec's comment before landing.
Comment 21 Alec Flett 2012-12-17 09:02:18 PST
yes this is much better.

You'll notice I filed bug 105186 to address my own note to self in comment 8.
Comment 22 WebKit Review Bot 2012-12-17 16:00:53 PST
Comment on attachment 179678 [details]
Patch

Clearing flags on attachment: 179678

Committed r137954: <http://trac.webkit.org/changeset/137954>
Comment 23 WebKit Review Bot 2012-12-17 16:00:58 PST
All reviewed patches have been landed.  Closing bug.