Bug 98814

Summary: IndexedDB: Key paths should support non-ASCII identifiers
Product: WebKit Reporter: Joshua Bell <jsbell>
Component: WebCore Misc.Assignee: Joshua Bell <jsbell>
Status: RESOLVED FIXED    
Severity: Normal CC: alecflett, dgrogan, haraken, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch for landing none

Description Joshua Bell 2012-10-09 12:17:36 PDT
IDB spec http://dvcs.w3.org/hg/IndexedDB/raw-file/tip/Overview.html#dfn-key-path says:

A key path is a DOMString or Array that defines how to extract a key from a value. A valid key path is one of:
* An empty DOMString.
* A DOMString containing a JavaScript identifier [ECMA-262].
* A DOMString containing multiple Javascript identifiers separated by periods (ASCII character code 46) [ECMA-262].
* A non-empty Array containing only DOMStrings conforming to the above requirements.

And ECMA-262 http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf#page=29 sayeth:

"Identifier Names are tokens that are interpreted according to the grammar given in the ―Identifiers‖ section of 
chapter 5 of the Unicode standard, with some small modifications. An Identifier is an IdentifierName that is not 
a  ReservedWord (see 7.6.1). The Unicode identifier grammar is based on both normative and informative 
character categories specified by the Unicode Standard. The characters in the specified categories in version 
3.0 of the Unicode standard must be treated as in those categories by all conforming ECMAScript 
implementations.

"This standard specifies specific character additions: The dollar sign ($) and the underscore (_) are permitted 
anywhere in an IdentifierName."

And this is followed by a grammar.

However, the WebKit IDB implementation uses:

static inline bool isSafeIdentifierStartCharacter(UChar c)
{
    return isASCIIAlpha(c) || (c == '_') || (c == '$');
}

static inline bool isSafeIdentifierCharacter(UChar c)
{
    return isASCIIAlphanumeric(c) || (c == '_') || (c == '$');
}

... so we fail http://w3c-test.org/webapps/IndexedDB/tests/submissions/Opera/keypath.htm
Comment 1 Joshua Bell 2012-10-10 14:31:05 PDT
Created attachment 168072 [details]
Patch
Comment 2 Joshua Bell 2012-10-10 14:31:23 PDT
dgrogan@, alecflett@ - can you take a look?
Comment 3 Joshua Bell 2012-10-10 16:58:14 PDT
Created attachment 168097 [details]
Patch
Comment 4 Joshua Bell 2012-10-10 16:59:06 PDT
Updated patch fixes escaping and adds invalid keypath tests (other Unicode categories as initial/trailing identifier characters)
Comment 5 Joshua Bell 2012-10-11 10:22:54 PDT
+haraken@ who wrote implemented the current lexer
Comment 6 Joshua Bell 2012-10-11 10:24:38 PDT
And note that I filed https://www.w3.org/Bugs/Public/show_bug.cgi?id=19450 about an edge case (reserved words being excluded - FF doesn't and I don't think it should be required)
Comment 7 Joshua Bell 2012-10-11 10:27:54 PDT
Also, I did not add additional cases to WebKit/chromium/tests/IDBKeyPathTest.cpp - it seemed redundant with the layout test once the basic API was exercised from C++.
Comment 8 David Grogan 2012-10-12 15:01:00 PDT
Comment on attachment 168097 [details]
Patch

LGTM

After looking at 3.1.5 Key Path in the idb spec and taking a cursory glance at ecma-262, this seems reasonable.
Comment 9 Joshua Bell 2012-10-12 15:33:26 PDT
haraken@ - r?
Comment 10 Kentaro Hara 2012-10-14 08:19:37 PDT
Comment on attachment 168097 [details]
Patch

Looks OK
Comment 11 Joshua Bell 2012-10-15 10:09:05 PDT
Created attachment 168734 [details]
Patch for landing
Comment 12 WebKit Review Bot 2012-10-15 14:22:11 PDT
Comment on attachment 168734 [details]
Patch for landing

Clearing flags on attachment: 168734

Committed r131356: <http://trac.webkit.org/changeset/131356>
Comment 13 WebKit Review Bot 2012-10-15 14:22:14 PDT
All reviewed patches have been landed.  Closing bug.