Bug 99310 - [JSC] Indexeddb unable to deserialize the string record
Summary: [JSC] Indexeddb unable to deserialize the string record
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Major
Assignee: Charles Wei
URL:
Keywords:
: 102293 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-10-15 05:15 PDT by Charles Wei
Modified: 2013-03-25 23:42 PDT (History)
9 users (show)

See Also:


Attachments
indexeddb test case (15.57 KB, text/html)
2012-10-15 05:16 PDT, Charles Wei
no flags Details
Patch (2.35 KB, patch)
2012-10-15 05:59 PDT, Charles Wei
no flags Details | Formatted Diff | Diff
Patch (2.59 KB, patch)
2012-11-12 22:19 PST, Charles Wei
no flags Details | Formatted Diff | Diff
Patch (2.93 KB, patch)
2012-11-13 19:30 PST, Charles Wei
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Charles Wei 2012-10-15 05:15:51 PDT
The indexeddb for jsc binding failed to work (tested with BlackBerry porting), with the log:

Type error: unable to deserialize data.
Comment 1 Charles Wei 2012-10-15 05:16:35 PDT
Created attachment 168683 [details]
indexeddb test case
Comment 2 Charles Wei 2012-10-15 05:59:34 PDT
Created attachment 168694 [details]
Patch
Comment 3 Rob Buis 2012-10-15 06:47:33 PDT
Comment on attachment 168694 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=168694&action=review

I think someone with more knowledge about IndexDB should review.

> Source/WebCore/ChangeLog:3
> +        Indexeddb doesn't wrong with the attached test case and "Unable to deserialize data"

Title seems odd?
Comment 4 Charles Wei 2012-10-15 07:07:17 PDT
(In reply to comment #3)
> (From update of attachment 168694 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=168694&action=review
> 
> I think someone with more knowledge about IndexDB should review.
> 
> > Source/WebCore/ChangeLog:3
> > +        Indexeddb doesn't wrong with the attached test case and "Unable to deserialize data"
> 
> Title seems odd?

Sorry, that comes from the wrong title of the bug,which has been fixed. Will resubmit the patch with the correct title by s/wrong/work
Comment 5 Joshua Bell 2012-10-15 11:08:26 PDT
Comment on attachment 168694 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=168694&action=review

> Source/WebCore/ChangeLog:15
> +        The test case is attached at: https://bugs.webkit.org/show_bug.cgi?id=99310

Please convert the test case to a LayoutTest in storage/indexeddb so regressions will be caught automatically when JSC ports enable IDB.

> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1008
> +        // If the value contains odd number of content, append UndefinedTag at the end

Just to clarify what's going on:

"content" => "bytes"

> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1010
> +        // String is UChar (2 types) internal and takes even number of value.

"types" => "bytes"
"value" => "bytes"

> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1012
> +            writeLittleEndian<uint8_t>(const_cast<Vector<uint8_t>& >(value), UndefinedTag);

This is violating the "const" constraint and modifying the vector that is passed in, which is bad.

Can you find another way to do this?

I haven't traced through to see if adding an extra UndefinedTag at the end of the data is safe; the JSC toWireString(Vector<char>)/create(String) code is much more complicated than it seems like it needs to be, so there's stuff going on I'm not familiar with.

Why doesn't this issue affect non-IDB use of SerializedScriptValue?
Comment 6 Charles Wei 2012-11-12 21:41:56 PST
Comment on attachment 168694 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=168694&action=review

>> Source/WebCore/ChangeLog:15
>> +        The test case is attached at: https://bugs.webkit.org/show_bug.cgi?id=99310
> 
> Please convert the test case to a LayoutTest in storage/indexeddb so regressions will be caught automatically when JSC ports enable IDB.

Actually the existing test cases cover this case.  This happens when IndexedDB.put('string', 'key');  The following existing test cases fail without the patch and succeed with the fix.

LayoutTests/storage/indexeddb/cursor_delete.html
LayoutTests/storage/indexeddb/cursor_inconsistency.html
LayoutTests/storage/indexeddb/cursor_prev_no_duplicate.html 

because the internal of the SerializedScriptValue for String is:

<version,2 bytes> <StringTag, 1 byte> <Length 2 bytes> <UChar 2*n bytes>

That is an odd bytes.  While when persist the value, we call String SerializedScriptValue::toWireString() to make it a String for persistence. if we don't pad with one byte, we lose the last byte of the SerializedScriptValue and can't convert it back with SerializedScriptValue::createFromWire() and SerializedScriptValue::deserialize().

>> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1008
>> +        // If the value contains odd number of content, append UndefinedTag at the end
> 
> Just to clarify what's going on:
> 
> "content" => "bytes"

ok.

>> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1010
>> +        // String is UChar (2 types) internal and takes even number of value.
> 
> "types" => "bytes"
> "value" => "bytes"

Ok.

>> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1012
>> +            writeLittleEndian<uint8_t>(const_cast<Vector<uint8_t>& >(value), UndefinedTag);
> 
> This is violating the "const" constraint and modifying the vector that is passed in, which is bad.
> 
> Can you find another way to do this?
> 
> I haven't traced through to see if adding an extra UndefinedTag at the end of the data is safe; the JSC toWireString(Vector<char>)/create(String) code is much more complicated than it seems like it needs to be, so there's stuff going on I'm not familiar with.
> 
> Why doesn't this issue affect non-IDB use of SerializedScriptValue?

There's no other better way to cast away the const, another way is to remove the const from the argument. 

why doesn't this issue affect non-IDB use ? Because there's no others are using this except IDB. toWireString and createFromWire are inside INDEXEDDB macro.
Comment 7 Charles Wei 2012-11-12 22:19:08 PST
Created attachment 173823 [details]
Patch
Comment 8 Yong Li 2012-11-13 07:10:55 PST
Comment on attachment 173823 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=173823&action=review

> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1011
> +        if (value.size() % sizeof(UChar))

I remember Darin told me before UChar could be 4 bytes?
Comment 9 Yong Li 2012-11-13 07:14:10 PST
Comment on attachment 173823 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=173823&action=review

> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1012
> +            writeLittleEndian<uint8_t>(const_cast<Vector<uint8_t>& >(value), NullTag);

I don't like this const_cast, nor the unneeded null in String. Why don't we change "length" instead?
Comment 10 Yong Li 2012-11-13 07:38:54 PST
Comment on attachment 173823 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=173823&action=review

>> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1012
>> +            writeLittleEndian<uint8_t>(const_cast<Vector<uint8_t>& >(value), NullTag);
> 
> I don't like this const_cast, nor the unneeded null in String. Why don't we change "length" instead?

I was wrong about "length". But still, don't like the const_cast here.
Comment 11 Charles Wei 2012-11-13 08:24:18 PST
Comment on attachment 173823 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=173823&action=review

>> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1011
>> +        if (value.size() % sizeof(UChar))
> 
> I remember Darin told me before UChar could be 4 bytes?

In what scenario that Uchar is 4 bytes?

>>> Source/WebCore/bindings/js/SerializedScriptValue.cpp:1012
>>> +            writeLittleEndian<uint8_t>(const_cast<Vector<uint8_t>& >(value), NullTag);
>> 
>> I don't like this const_cast, nor the unneeded null in String. Why don't we change "length" instead?
> 
> I was wrong about "length". But still, don't like the const_cast here.

Just changing the length will leave an arbitrary byte at the end which might be interpreted differently. we intentionally put a NullTag at the end to make it even bytes, otherwise we will lose one bytes. The padding NullTag won't be interpreted to anything meaningful.

I also don't like the const_cast,but if we don't use it, we have to make a copy of the data and operate on the copy which is an even worse solution I think. We have to use if if we have no other better solutions. Or what do you suggest, yong?
Comment 12 Joshua Bell 2012-11-13 08:55:40 PST
Note that webkit.org/b/45110 has a fix for the same problem and doesn't do a const_cast - it does a data copy. While that's unfortunate I don't think a const_cast is acceptable because it breaks the API contract and could affect unrelated functions. Either doing a copy or changing the function signature is a better approach.
Comment 13 Yong Li 2012-11-13 09:01:30 PST
(In reply to comment #12)
> Note that webkit.org/b/45110 has a fix for the same problem and doesn't do a const_cast - it does a data copy. While that's unfortunate I don't think a const_cast is acceptable because it breaks the API contract and could affect unrelated functions. Either doing a copy or changing the function signature is a better approach.

Or CloneDeserializer::readString(start, end, str, length - 1), and then make up an 1-char buffer for the last char?

Also I'm not sure adding NullTag is safe. Charles, could we just ignore the last char? Or use "?"
Comment 14 Charles Wei 2012-11-13 19:30:40 PST
Created attachment 174052 [details]
Patch
Comment 15 Charles Wei 2012-11-13 19:32:11 PST
(In reply to comment #12)
> Note that webkit.org/b/45110 has a fix for the same problem and doesn't do a const_cast - it does a data copy. While that's unfortunate I don't think a const_cast is acceptable because it breaks the API contract and could affect unrelated functions. Either doing a copy or changing the function signature is a better approach.

Ok, the new patch just does a copy and operates on the copy instead of the original data, even though I don't like this solution.
Comment 16 Charles Wei 2012-11-13 19:33:35 PST
(In reply to comment #13)
> (In reply to comment #12)
> > Note that webkit.org/b/45110 has a fix for the same problem and doesn't do a const_cast - it does a data copy. While that's unfortunate I don't think a const_cast is acceptable because it breaks the API contract and could affect unrelated functions. Either doing a copy or changing the function signature is a better approach.
> 
> Or CloneDeserializer::readString(start, end, str, length - 1), and then make up an 1-char buffer for the last char?
> 
> Also I'm not sure adding NullTag is safe. Charles, could we just ignore the last char? Or use "?"

Adding NullTag is safer than anything else, I think.  we can't just ignore the last char, otherwise, it will be an arbitrary byte and unpredictable.
Comment 17 Oliver Hunt 2012-11-14 09:43:44 PST
Let me have a look at this, I need to work out what's going on, and the serialization code shouldn't be having any IDB specific issues.  For reference can you tell me what architecture and byte order blackberry uses?

Also UChar are only 2 bytes, a character may be four bytes (two UChar's), this is one of those great character vs. code point things.  We also like to try and store strings using a byte characters where possible.

Oh, and all changes to serialization should bump the version number.
Comment 18 Joshua Bell 2012-11-14 09:49:20 PST
SerializedScriptValue serializes to a Vector<char> - an opaque binary blob - then tries to store that result in a String (in toWireString). If the length of the Vector<char> is odd the data is truncated by one byte.

Nothing to do with code points vs. characters. It's purely about storing an opaque blob of binary data in a String and an issue with odd lengths. 

There's nothing IDB specific about this, but perhaps IDB is the only thing exercising the toWireString() method.
Comment 19 Joshua Bell 2012-11-14 09:54:19 PST
(In reply to comment #18)
> Nothing to do with code points vs. characters.

(er, code units vs. code points; sorry, caffeine is still kicking in.)
Comment 20 Oliver Hunt 2012-11-14 10:09:47 PST
(In reply to comment #19)
> (In reply to comment #18)
> > Nothing to do with code points vs. characters.
> 
> (er, code units vs. code points; sorry, caffeine is still kicking in.)

I was responding to the "UChar could be four bytes" comment.

Bigger question: wtf is idb using Strings as its storage format?  It's storing a binary blob, not a string.
Comment 21 Joshua Bell 2012-11-14 10:20:46 PST
(In reply to comment #20)
> Bigger question: wtf is idb using Strings as its storage format?  It's storing a binary blob, not a string.

Now that's a good question... the code around that hasn't been touched in the time I've been working on IDB. It doesn't look like there's a good reason; it's just being used as a length + bytes mechanism for serialization. Would be easy to change, apart from back-compat (we still need to be able to read old data that's already on disk).

From an IDB perspective it would make sense to remove the toWireString/createFromWire(String) methods in favor of producing/consuming Vector<char>.
Comment 22 Joshua Bell 2012-11-14 15:42:48 PST
*** Bug 102293 has been marked as a duplicate of this bug. ***
Comment 23 Charles Wei 2012-11-14 18:30:45 PST
(In reply to comment #21)
> (In reply to comment #20)
> > Bigger question: wtf is idb using Strings as its storage format?  It's storing a binary blob, not a string.
> 
> Now that's a good question... the code around that hasn't been touched in the time I've been working on IDB. It doesn't look like there's a good reason; it's just being used as a length + bytes mechanism for serialization. Would be easy to change, apart from back-compat (we still need to be able to read old data that's already on disk).
> 
> From an IDB perspective it would make sense to remove the toWireString/createFromWire(String) methods in favor of producing/consuming Vector<char>.

yes,  agree.  toWireString actually goes to a String and createFromWire's input is String,  that's what it is now , for both JSC and V8 bindings,  even though I don't think it's reasonable and what's the background of that design.  What this patch is about is to make it work with this mechanism,  refactor of this is out of the scope of this patch.
Comment 24 George Staikos 2012-11-19 21:43:06 PST
(In reply to comment #23)
> (In reply to comment #21)
> > (In reply to comment #20)
> > > Bigger question: wtf is idb using Strings as its storage format?  It's storing a binary blob, not a string.
> > 
> > Now that's a good question... the code around that hasn't been touched in the time I've been working on IDB. It doesn't look like there's a good reason; it's just being used as a length + bytes mechanism for serialization. Would be easy to change, apart from back-compat (we still need to be able to read old data that's already on disk).
> > 
> > From an IDB perspective it would make sense to remove the toWireString/createFromWire(String) methods in favor of producing/consuming Vector<char>.
> 
> yes,  agree.  toWireString actually goes to a String and createFromWire's input is String,  that's what it is now , for both JSC and V8 bindings,  even though I don't think it's reasonable and what's the background of that design.  What this patch is about is to make it work with this mechanism,  refactor of this is out of the scope of this patch.

I think Charles is right.  This patch looks like it fixes the bug, but we all seem to agree that it should move away from String at some point.  Any objections?
Comment 25 Charles Wei 2012-11-19 22:02:54 PST
Just ran the test on Mac , with and without this patch. The result shows that with the patch,  no regressions found, instead, some failed test cases succeeded now.

details:

Tests that failed text/pixel/audio diff (13)  :  same as without the patch.
Tests that had stderr output (1):  It was 8 without the patch.
Flaky tests (failed the first run and passed on retry) (13):  it was 17 without the patch.
Tests expected to fail but passed (5):  same as without the patch.
Comment 26 Joe Mason 2013-03-15 06:19:50 PDT
(In reply to comment #24)
> I think Charles is right.  This patch looks like it fixes the bug, but we all seem to agree that it should move away from String at some point.  Any objections?

Seems to me that if this fixes the bug it should be committed ASAP and we should file a followup bug to convert away from String.
Comment 27 Michael Pruett 2013-03-15 10:04:42 PDT
(In reply to comment #26)
> Seems to me that if this fixes the bug it should be committed ASAP and we should file a followup bug to convert away from String.

As of the changes made in bug 104354, IndexedDB serialization no longer uses strings. This bug is no longer relevant and should be closed.
Comment 28 Charles Wei 2013-03-25 23:40:55 PDT
This patch doesn't seem to apply anymore with the change upstream.