Implement IDBKeyPath parser.
Created attachment 62579 [details] Patch
Comment on attachment 62579 [details] Patch WebCore/storage/IndexedDatabaseRequest.idl:36 + // FIXME: IDBKeyPath is public only to allow testing. FIXMEs should have an action...i.e. delete as soon as we can because it's only here for testing...fix elsewhere. LayoutTests/storage/indexeddb/idb-keypath-expected.txt:17 + indexedDB.makeKeyPath('a[34][20].foo[2].bar') maybe insert a newline via "debug('');" between test cases? (here and elsewhere) LayoutTests/storage/indexeddb/script-tests/idb-keypath.js:9 + for (;keyPath.hasNext(); keyPath.next(), ++i) { space before keyPath...but why not put the var i = 0 in there? WebCore/storage/IDBKeyPath.cpp:50 + Token token; What does this do? WebCore/storage/IDBKeyPath.cpp:63 + void IDBKeyPath::next() I guess this next interface stuff makes sense in terms of easily testing (without building a lot of infrastructure that's just going to go away) but for the real implementation, I don't see why we don't just give a pointer to the raw array + size check. This stuff seems overly complicated. WebCore/storage/IDBKeyPath.cpp:173 + return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'); this is a subset of javascript identifiers Do we really want to support array syntax within keyRanges? Also, what do we do when we see malformed keyPaths? Right now it seems we try our best to parse them, but I don't think this is the right behavior. I think we should raise an exception on parse errors. (The create function should return 0 and the caller would raise.) All of this really needs to be specced out tho. Haven't looked at the parser or tokenizer in detail yet. To be honest, it feels a little heavyweight for our needs, but I'm guessing it'll pay off in the long run. We'll probably want multiple eyes looking at it though...since it'd be one of the easier ways to introduce security issues, I'd think.
Created attachment 62593 [details] Patch
thanks for the comments so far, Jeremy! brief background / reasoning: 1. not sure whether or not we'll support arrays, but it seems that if we support named properties, we should support indexed values as well. 2. the reason for a separate class is only to expose for tests, since we have no way to exercise this code (yet). 3. just passing the array with a known struct will be fine, the API is mostly exposed for tests. however, the basic concept would still be the same: I think it'd be nice to have the "SerializedScriptValueExtractor" just traverse some kind of structure: get a name or index, de-serialize, repeat.. having this separation would allow us to decouple the the extractor from the parser. (In reply to comment #2) > (From update of attachment 62579 [details]) > WebCore/storage/IndexedDatabaseRequest.idl:36 > + // FIXME: IDBKeyPath is public only to allow testing. > FIXMEs should have an action...i.e. delete as soon as we can because it's only here for testing...fix elsewhere. Done. > > LayoutTests/storage/indexeddb/idb-keypath-expected.txt:17 > + indexedDB.makeKeyPath('a[34][20].foo[2].bar') > maybe insert a newline via "debug('');" between test cases? (here and elsewhere) Done in the "check" function above. > > > > LayoutTests/storage/indexeddb/script-tests/idb-keypath.js:9 > + for (;keyPath.hasNext(); keyPath.next(), ++i) { > space before keyPath...but why not put the var i = 0 in there? done. > > > > WebCore/storage/IDBKeyPath.cpp:50 > + Token token; > What does this do? duh, leftover. removed. > > WebCore/storage/IDBKeyPath.cpp:63 > + void IDBKeyPath::next() > I guess this next interface stuff makes sense in terms of easily testing (without building a lot of infrastructure that's just going to go away) but for the real implementation, I don't see why we don't just give a pointer to the raw array + size check. This stuff seems overly complicated. Agreed. See above. > > WebCore/storage/IDBKeyPath.cpp:173 > + return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'); > this is a subset of javascript identifiers Not yet done, I'll try to find the authoritative list of identifiers. > > > > > Do we really want to support array syntax within keyRanges? Also, what do we do when we see malformed keyPaths? Right now it seems we try our best to parse them, but I don't think this is the right behavior. I think we should raise an exception on parse errors. (The create function should return 0 and the caller would raise.) All of this really needs to be specced out tho. Indeed. The keyPath is not yet fully specced, so I'm flying a bit blind here. However, I think the main purpose of decoupling the "SerializedScriptValueExtractor" from the "KeyPathParser" is a "Good Thing", and will allow us to move to a point where the "Extractor"s (both JSC and V8) don't care about what a KeyPath is, as long as it provides an iterator of property names / indexes. thoughts? ahn, I also exposed a "parserError", the caller can do whatever we decide (either throw an exception, ignore, use what was able to parse, ...) > > Haven't looked at the parser or tokenizer in detail yet. To be honest, it feels a little heavyweight for our needs, but I'm guessing it'll pay off in the long run. We'll probably want multiple eyes looking at it though...since it'd be one of the easier ways to introduce security issues, I'd think. as mentioned somewhere else, there are two parts that worry me: 1. parsing the keypath (this bit). 2. extracting the value of a serializedscriptvalue.. both will work on unsafe, external data, and the more eyes (and tests!) we get the better. feel free to add anyone else to this bug.
(In reply to comment #4) > thanks for the comments so far, Jeremy! > > brief background / reasoning: > 1. not sure whether or not we'll support arrays, but it seems that if we support named properties, we should support indexed values as well. > 2. the reason for a separate class is only to expose for tests, since we have no way to exercise this code (yet). > 3. just passing the array with a known struct will be fine, the API is mostly exposed for tests. however, the basic concept would still be the same: I think it'd be nice to have the "SerializedScriptValueExtractor" just traverse some kind of structure: get a name or index, de-serialize, repeat.. having this separation would allow us to decouple the the extractor from the parser. I agree they should be separate. But is there any possible way we can make the "deserialize" code within SerializedScriptValue use the same extractor? > (In reply to comment #2) > > (From update of attachment 62579 [details] [details]) > > WebCore/storage/IndexedDatabaseRequest.idl:36 > > + // FIXME: IDBKeyPath is public only to allow testing. > > FIXMEs should have an action...i.e. delete as soon as we can because it's only here for testing...fix elsewhere. > Done. > > > > > LayoutTests/storage/indexeddb/idb-keypath-expected.txt:17 > > + indexedDB.makeKeyPath('a[34][20].foo[2].bar') > > maybe insert a newline via "debug('');" between test cases? (here and elsewhere) > Done in the "check" function above. > > > > > > > > > LayoutTests/storage/indexeddb/script-tests/idb-keypath.js:9 > > + for (;keyPath.hasNext(); keyPath.next(), ++i) { > > space before keyPath...but why not put the var i = 0 in there? > done. > > > > > > > > > WebCore/storage/IDBKeyPath.cpp:50 > > + Token token; > > What does this do? > duh, leftover. removed. > > > > > WebCore/storage/IDBKeyPath.cpp:63 > > + void IDBKeyPath::next() > > I guess this next interface stuff makes sense in terms of easily testing (without building a lot of infrastructure that's just going to go away) but for the real implementation, I don't see why we don't just give a pointer to the raw array + size check. This stuff seems overly complicated. > > Agreed. See above. > > > > > WebCore/storage/IDBKeyPath.cpp:173 > > + return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'); > > this is a subset of javascript identifiers > Not yet done, I'll try to find the authoritative list of identifiers. > > > > > > > > > > > > Do we really want to support array syntax within keyRanges? Also, what do we do when we see malformed keyPaths? Right now it seems we try our best to parse them, but I don't think this is the right behavior. I think we should raise an exception on parse errors. (The create function should return 0 and the caller would raise.) All of this really needs to be specced out tho. > > Indeed. The keyPath is not yet fully specced, so I'm flying a bit blind here. > However, I think the main purpose of decoupling the "SerializedScriptValueExtractor" from the "KeyPathParser" is a "Good Thing", and will allow us to move to a point where the "Extractor"s (both JSC and V8) don't care about what a KeyPath is, as long as it provides an iterator of property names / indexes. > thoughts? > > ahn, I also exposed a "parserError", the caller can do whatever we decide (either throw an exception, ignore, use what was able to parse, ...) But if there's an error, should we even return an object? > > > > Haven't looked at the parser or tokenizer in detail yet. To be honest, it feels a little heavyweight for our needs, but I'm guessing it'll pay off in the long run. We'll probably want multiple eyes looking at it though...since it'd be one of the easier ways to introduce security issues, I'd think. > > as mentioned somewhere else, there are two parts that worry me: > 1. parsing the keypath (this bit). > 2. extracting the value of a serializedscriptvalue.. > > both will work on unsafe, external data, and the more eyes (and tests!) we get the better. > feel free to add anyone else to this bug.
> indexedDB.makeKeyPath('foo.bar.zoo') Hmm, makeKeyPath() is not a method that exists in the spec. Rather than extending the indexedDB object with methods that are intended solely for testing, perhaps we should add a couple of methods to LayoutTestController instead? > module storage { > // FIXME: IDBKeyPath is public only to allow testing, remove as soon as we have more comprehensive > test infrastructure for IndexedDB. > interface [ > Conditional=INDEXED_DATABASE > ] IDBKeyPath { Oh, I see. Looking around, I cannot find any precedent for this. So maybe it's better to have a LTC method that returns true / false depending on whether the input is a valid keypath and was successfully parsed or not? I realize this doesn't allow as extensive testing, but I am not sure it's ok to add non-standard interfaces for testing, either. >168 else > 169 state = Start; Here we are in an array, after the right bracket and we jump to the Start state, which allows an identifier. Does this mean that the following sequence will parse successfully? identifier[100]anotherIdentifier Looks great, otherwise!
Comment on attachment 62593 [details] Patch WebCore/storage/IDBKeyPath.cpp:193 + return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'); Should we use the version of this function in ASCIITypes.h? WebCore/storage/IDBKeyPath.cpp:203 + token.start = token.end = m_ptr; Please don't use this form of assignment. WebCore/storage/IDBKeyPath.cpp:206 + ASSERT(m_ptr <= m_end); Should this be m_ptr < m_end ? WebCore/storage/IDBKeyPath.cpp:237 + return TokError; I'd move this into the default case, but that's just a stylistic point. WebCore/storage/IDBKeyPath.cpp:242 + const UChar* runStart = m_ptr; I'd just call that "start" WebCore/storage/IDBKeyPath.cpp:249 + token.stringToken = builder.toString(); Why are we using a string builder here? The append is outside the loop. WebCore/storage/IDBKeyPath.cpp:265 + return TokError; so 01 only tokenizes the "0" and leaves the 1. I presume that's what you mean. WebCore/storage/IDBKeyPath.cpp:271 + for (i = 0; i < token.end - token.start; i++) { Please break "token.end - token.start" out as a length variable since you're using it more than once. WebCore/storage/IDBKeyPath.cpp:275 + buffer[i] = 0; Yuck. This seems error prone. I can see that it's correct, but just computing the int directly seems way easier.
Created attachment 62703 [details] Patch
thanks everyone! this is not yet final: I'll remove the IDLs and add support via TestShell / DRT wrappers soon, but would appreciate if you could take another look at the parser? replies inline: jorlow: (In reply to comment #5) > (In reply to comment #4) > > thanks for the comments so far, Jeremy! > > > > brief background / reasoning: > > 1. not sure whether or not we'll support arrays, but it seems that if we support named properties, we should support indexed values as well. > > 2. the reason for a separate class is only to expose for tests, since we have no way to exercise this code (yet). > > 3. just passing the array with a known struct will be fine, the API is mostly exposed for tests. however, the basic concept would still be the same: I think it'd be nice to have the "SerializedScriptValueExtractor" just traverse some kind of structure: get a name or index, de-serialize, repeat.. having this separation would allow us to decouple the the extractor from the parser. > > I agree they should be separate. But is there any possible way we can make the "deserialize" code within SerializedScriptValue use the same extractor? yep, this is going to be the second step. either: 1. homogeneize the SerializedScriptValue interfaces to provide something like "getNamedPropertyValue()" and "getIndexedPropertyValue()" so that the extractor doesn't care if it's talking to V8 or JSC. 2. pass this array (or iterator) of identifiers, and let the SerializedScriptValues deal with it. I'm not there yet to decide, but would like to go with (1) rather than (2).. > > Indeed. The keyPath is not yet fully specced, so I'm flying a bit blind here. > > However, I think the main purpose of decoupling the "SerializedScriptValueExtractor" from the "KeyPathParser" is a "Good Thing", and will allow us to move to a point where the "Extractor"s (both JSC and V8) don't care about what a KeyPath is, as long as it provides an iterator of property names / indexes. > > thoughts? > > > > ahn, I also exposed a "parserError", the caller can do whatever we decide (either throw an exception, ignore, use what was able to parse, ...) > > But if there's an error, should we even return an object? this is pending decision on the spec. also: I fixed the JS identifiers. andreip: (In reply to comment #6) > > indexedDB.makeKeyPath('foo.bar.zoo') > > Hmm, makeKeyPath() is not a method that exists in the spec. Rather than extending the indexedDB object with methods that are intended solely for testing, perhaps we should add a couple of methods to LayoutTestController instead? yep, I'll do this shortly (need to create WebKit wrappers, etc..) but meanwhile, could you please take another look at the parsing bits? I hope the next patches will only add the wrappers and not touch the logic. > > > module storage { > > // FIXME: IDBKeyPath is public only to allow testing, remove as soon as we have more comprehensive > test infrastructure for IndexedDB. > > interface [ > > Conditional=INDEXED_DATABASE > > ] IDBKeyPath { > > Oh, I see. Looking around, I cannot find any precedent for this. So maybe it's better to have a LTC method that returns true / false depending on whether the input is a valid keypath and was successfully parsed or not? I realize this doesn't allow as extensive testing, but I am not sure it's ok to add non-standard interfaces for testing, either. > > >168 else > > 169 state = Start; > > Here we are in an array, after the right bracket and we jump to the Start state, which allows an identifier. Does this mean that the following sequence will parse successfully? > > identifier[100]anotherIdentifier good point, added test case. > > Looks great, otherwise! thanks! abarth: (In reply to comment #7) > (From update of attachment 62593 [details]) > WebCore/storage/IDBKeyPath.cpp:193 > + return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'); > Should we use the version of this function in ASCIITypes.h? done. > > WebCore/storage/IDBKeyPath.cpp:203 > + token.start = token.end = m_ptr; > Please don't use this form of assignment. fixed. > > WebCore/storage/IDBKeyPath.cpp:206 > + ASSERT(m_ptr <= m_end); > Should this be m_ptr < m_end ? done. > > WebCore/storage/IDBKeyPath.cpp:237 > + return TokError; > I'd move this into the default case, but that's just a stylistic point. not sure I understood this.. > > WebCore/storage/IDBKeyPath.cpp:242 > + const UChar* runStart = m_ptr; > I'd just call that "start" done. > > WebCore/storage/IDBKeyPath.cpp:249 > + token.stringToken = builder.toString(); > Why are we using a string builder here? The append is outside the loop. left over, using String directly. > > WebCore/storage/IDBKeyPath.cpp:265 > + return TokError; > so 01 only tokenizes the "0" and leaves the 1. I presume that's what you mean. it was an error, added a test case and fixed. > > WebCore/storage/IDBKeyPath.cpp:271 > + for (i = 0; i < token.end - token.start; i++) { > Please break "token.end - token.start" out as a length variable since you're using it more than once. see below. > > WebCore/storage/IDBKeyPath.cpp:275 > + buffer[i] = 0; > Yuck. This seems error prone. I can see that it's correct, but just computing the int directly seems way easier. I completely changed it to use to String::toIntStrict(), looks much simpler and safer now.
Created attachment 62830 [details] Patch
Hi, I completely removed the IDL and (un)related methods, so now IDBKeyPath is not accessible via javascript. I added a unit test, and as soon as IndexedDatabase starts using it, we'll be able to have more comprehensive and cross-platform LayoutTests. Another look, please? Thanks! (In reply to comment #10) > Created an attachment (id=62830) [details] > Patch
> void IDBKeyPath::parse() > (...) > case Array : { > (...) > m_lexer.next(); > token = m_lexer.currentToken(); Same question here as in 42976: You are only checking for dot or left bracket. But what if tokEnd follows? Seems like you're returning ParserErrorAfterArray in that case?
Created attachment 63186 [details] Patch
(In reply to comment #12) > > void IDBKeyPath::parse() > > (...) > > case Array : { > > (...) > > m_lexer.next(); > > token = m_lexer.currentToken(); > > Same question here as in 42976: You are only checking for dot or left bracket. But what if tokEnd follows? Seems like you're returning ParserErrorAfterArray in that case? yes, good point. fixed in the latest patch, and also tidy up with the following rule: 1. added an "End" state. 2. on any given state, there's a set of "state changes" depending on the condition. 3. if there's an error, it returns. I also added a couple of test for this case as well. hopefully this simplifies the state transitions, but I'd surely appreciate more eyes looking into it. another look, please?
LGTM
Created attachment 63487 [details] Patch
Created attachment 63489 [details] Patch
replying Jeremy's comments from https://bugs.webkit.org/show_bug.cgi?id=43276 the major change is that I removed the class and instead there's a function that returns a vector of IDBKeyPathElements. plus some other clean ups. more below. another look please? (In reply to comment #11) > (From update of attachment 63313 [details]) > WebCore/storage/IDBKeyPath.cpp:101 > + if (token == TokIdentifier) > Use a switch? hmm, it doesn't seem it'd make it any more readable. > > WebCore/storage/IDBKeyPath.cpp:46 > + , m_currentToken(0) > Use the enum value this is gone. > > WebCore/storage/IDBKeyPath.cpp:58 > + return m_currentToken < m_tokens.size(); > I guess I'd suggest just converting over all this hasNext/next stuff to just returning the raw vector. I don't feel strongly, but I just don't see much of a point of creating this interface. removed the class, converted to a single function that returns a vector of elements. > > WebCore/storage/IDBKeyPath.cpp:70 > + return ""; > Maybe you should just let it crash? I.e. not do bounds checking? obsolete. > > WebCore/storage/IDBKeyPath.cpp:90 > + return m_parserError; > Maybe all the methods should be ASSERTing that a parse error never happened? the function now returns an error, it may be better to let the caller decide what to do with it? > > WebCore/storage/IDBKeyPath.cpp:103 > + else if (token == TokLBracket) > I'd lean towards spelling these all the way out. Done. > > WebCore/storage/IDBKeyPath.cpp:116 > + Token idbToken; > these 2 token variable names are confusing. The idb doesn't make it any less so. Maybe 'outputToken' or something? There's now a TokenType (with the names spelled out as above), and IDBKeyPathElement, let me know if it's any clearer. > > > > general note: you should talk to the guy who does fuzzers (skylined..?) and see if we can get some coverage on this indeed! once we agree on the implementation, I'll add more people to it. > > also, could we just do this more simply? for example, in python pseudo code I could do this: > > tokens = [] > if (path == ""): > return [] > for (component in path.split(".")): > if (component ~= /^([a-zA-Z0-9_]+)\[([0-9]+)\]$/): > tokens.append($1, $2) > else if (component ~= /^([a-zA-Z0-9_]+)$/): > tokens.append($1, null) > else: > return error > return tokens > And that'd be about it. Do we have any sort of regex support build into WebKit already? The split would be pretty easy to do as well. I'm just concerned that we're jumping the gun with all of this complexity. > > Anyhow... I don't think this would be much simpler.. We'd still need to get the numbers out of the indices. There's the n-ary array case "foo[3][4]", etc.. seems that this single-pass parser might be simpler in the end? > > > WebCore/storage/IDBKeyPath.cpp:145 > + ASSERT(token.type == TokNumber); > this seems mildly redundant, but I'm ok with it > > WebCore/storage/IDBKeyPath.cpp:163 > + state = Array; > but this is an invalid state! not really. it's perfectly possible to have n-ary array, i.e., "foo[3][9][5].age". ValidKeyPath1 test exercises this. > > WebCore/storage/IDBKeyPath.cpp:147 > + idbToken.hasIndex = true; > maybe assert it was false? obsolete. > > WebCore/storage/IDBKeyPath.cpp:190 > + > default: > raise error as above, perhaps the caller should decide? > > WebCore/storage/IDBKeyPath.cpp:99 > + case Start : { > maybe pull this out of the state machine? done. > > WebCore/storage/IDBKeyPath.cpp:278 > + return TokError; > I'd just do an m_ptr >= m_end first and then return...then no need for an else done. > > WebCore/storage/IDBKeyPath.cpp:292 > + return TokError; > same again...test the error condition and return rather than putting everythign in a block done. > > WebCore/storage/IDBKeyPath.cpp:252 > + if (m_ptr < m_end && isSafeIdentifierStartCharacter(*m_ptr)) > ditto done. > > WebCore/storage/IDBKeyPath.h:84 > + class Lexer { > Why does this need to be defined here? moved to the cpp file. > > WebCore/storage/IDBKeyPath.h:116 > + const UChar* m_ptr; > should this be called m_cursor? hmm, cursor and idb is a combination already in use.. isn't ptr clear? > > WebCore/storage/IDBKeyPath.h:100 > + TokenType next() > these could all be on one line removed.
Comment on attachment 63489 [details] Patch get rid of the indexeddatabaserequest stuff
Created attachment 63618 [details] Patch
(In reply to comment #19) > (From update of attachment 63489 [details]) > get rid of the indexeddatabaserequest stuff done, it should now contain only the parser and associated tests.
Created attachment 64124 [details] Patch
Fixed some minor comments as a result of https://bugs.webkit.org/show_bug.cgi?id=43276 skylined: your name was mentioned as a possible reviewer, would you mind help us here?
Comment on attachment 64124 [details] Patch WebCore/storage/IDBKeyPath.cpp:58 + return m_currentTokenType; Bad indent. You can also put this all on one line. Why does the name of the access not match the name of the member variable? WebCore/storage/IDBKeyPath.cpp:67 + const IDBKeyPathElement& currentElement() Also can be on one line.
Created attachment 64214 [details] Patch
you gotten an r+ so unless you've made non-trivial changes, just commit.
thanks Adam! all comments addressed. Chris: jorlow suggested adding you as a reviewer for this patch, would you mind taking a look at it? quick background: this patch parses an "IDBKeyPath" (part of indexed database). an "IDBKeyPath" basically specifies an expression that can be applied to an object in order to extract a property, say "items[3].age" would be applied to an object to obtain the age of the third element in items array of said object. for completeness, the whole chain besides patch contains: the "extractor" bit, i.e., given parsed key path and an object, returns the key. https://bugs.webkit.org/show_bug.cgi?id=43276 the utility process (sandbox) that will run this code in the chromium side: http://codereview.chromium.org/3043037/show please, ping me if you need more information about this. (In reply to comment #24) > (From update of attachment 64124 [details]) > WebCore/storage/IDBKeyPath.cpp:58 > + return m_currentTokenType; > Bad indent. You can also put this all on one line. Why does the name of the access not match the name of the member variable? Done. > > WebCore/storage/IDBKeyPath.cpp:67 > + const IDBKeyPathElement& currentElement() > Also can be on one line. Done.
Looks like a lot of care has been taken; thanks. What about: + int number = numberAsString.toIntStrict(&ok); Does toIntStrict() permit negatives? If so, is that desired and should it be rejected? What about a crazy large int? Should that be rejected out of hand here or does that check belong at a different level?
Crazy large int sounds like a good test case, btw.
Comment on attachment 64214 [details] Patch Please address Evan's comments. I'll take one more quick skim through and rubber stamp it then.
Created attachment 64340 [details] Patch
thanks for the quick reply chris! comments addressed, more inline: (In reply to comment #28) > Looks like a lot of care has been taken; thanks. > What about: > + int number = numberAsString.toIntStrict(&ok); > > Does toIntStrict() permit negatives? If so, is that desired and should it be rejected? good point! made it unsigned, and using toUIntStrict. > > What about a crazy large int? Should that be rejected out of hand here or does that check belong at a different level? added a test for it (toUIntStrict fails for larger-than-int). also, you're correct: higher level (https://bugs.webkit.org/show_bug.cgi?id=43276) tests for the value being available in the array, it never de-references as-is. another quick look please?
Comment on attachment 64340 [details] Patch Thanks for getting a thorough review. It looks like everyone's comments have been addressed.
Committed r65417: <http://trac.webkit.org/changeset/65417>