Summary: | Implement ES 3.1 JSON object | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Adam Barth <abarth> | ||||||||||||||||||||
Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> | ||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||
Severity: | Normal | CC: | abarth, ap, arv, bradneuberg, cedricv, ddkilzer, dglazkov, emacemac7, erights, gavin.sharp, ggaren, marcus, mike, mjs, ml, ojan, rik, sam, sriram.neelakandan, timothy, tony, webmaster, zapperlott, zwarich | ||||||||||||||||||||
Priority: | P2 | Keywords: | ES5 | ||||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||
URL: | http://wiki.ecmascript.org/doku.php?id=es3.1:es3.1_proposal_working_draft | ||||||||||||||||||||||
Bug Depends on: | 26249, 26587, 26592 | ||||||||||||||||||||||
Bug Blocks: | 25419 | ||||||||||||||||||||||
Attachments: |
|
Description
Adam Barth
2008-07-13 21:29:43 PDT
This will likely take more that one patch, so it might make sense to use this as a meta bug. I'm interested in making this happen, so I thought I'd start with the easy bits. :) Created attachment 22263 [details]
Work-in-progress: Step 1: Add skeleton JSON object
I'm not sure if we want to start landing these patches until we get some level of basic functionality.
Created attachment 22264 [details]
Work-in-progress: Step 2: Beginnings of JSON parser
This parser knows how to eat JSON and how not to eat non-JSON. The next step is to actually build a JSObject.
Created attachment 22265 [details]
Work-in-progress: Step 2: Beginnings of JSON parser
Fix a few bugs.
Created attachment 22271 [details]
Work-in-progress: Step 2: JSON parser front-end
Here is a reasonable front-end for the JSON parser. We're still missing the back end.
Created attachment 22272 [details]
Work-in-progress: Step 3: Parser frond-end tests
Tests for the parser front end. These test cases come from Firefox. Should these tests be in the JavaScriptCore/tests directory? I wasn't sure (1) how to add tests there or (2) how to get these exact characters to throw at the parser without XMLHttpRequest.
Comment on attachment 22263 [details]
Work-in-progress: Step 1: Add skeleton JSON object
We usually don't use a static property map in JSC for things like this, and instead put the functions directly in the Object it self, see DateConstructor for an example of how do this.
Created attachment 22277 [details] Work-in-progress: Step 4: Parser back-end with tests Here is a back-end for the parser. Sam, I'll address your comment next time I rebase the whole patch stream. I'm going to ask Doug if he has more test cases. Here is the Firefox native JSON bug: https://bugzilla.mozilla.org/show_bug.cgi?id=408838 Comment on attachment 22271 [details]
Work-in-progress: Step 2: JSON parser front-end
I didn't attempt to really review this (and the patch is not up for review anyway :) ), but here are some comments I had while skimming over.
+static inline void ws(State* state)
This definitely needs a better name, and ideally, should share code with other isWhitespace/skipWhitespace functions we have (e.g. Lexer::isWhiteSpace, isASCIIWhitespace).
+ return paddedLiteral(state, L'[');
On the Mac, wchar_t is 32-bit, so we cannot use this type and its associated compiler features.
+ static const UChar falseLiteral[] = { 'f', 'a', 'l', 's', 'e', '\0' };
I'm not 100% sure, but I suspect that this array may be initialized at runtime, not compile time. If so, this initialization is not thread safe.
+ while(digit(state))
There should be a space after while.
A general question: what is the reason for writing a custom parser, and not using bison?
I discussed this with Sam, Geoff and Oliver at the office today. Most of us agreed that using the existing JavaScriptCore lexer along with a bison grammar would be a better approach than an ad-hoc custom parser. Geoff also argued that it would be better to use the full JS parser and then study the syntax tree to ensure that it has no JSON violations, but the rest of us thought the existing lexer plus a custom grammar would work better. (In theory though, I guess it could create the same kinds of AST nodes as the full parser and thus the back end could just consist of compiling and executing the resulting syntax tree. Or alternately the grammar could directly create JS values and combine them, but there might be GC problems with this approach). > I discussed this with Sam, Geoff and Oliver at the office today. Most of us > agreed that using the existing JavaScriptCore lexer along with a bison grammar > would be a better approach than an ad-hoc custom parser. Unfortunately, we can't reuse the existing lexer because JSON has different lexing rules. For example, JSON only supports some of JavaScript string escape sequences. > Geoff also argued that > it would be better to use the full JS parser and then study the syntax tree to > ensure that it has no JSON violations, but the rest of us thought the existing > lexer plus a custom grammar would work better. Just looking at the syntax tree is far too late. There are many requirements of the JSON RFC that are lost at that point. I've made some amount of progress with a bison grammar for a new lexer. I'll post a patch once I get sufficiently far to be able to handle simple examples. It is unclear whether this approach will yield a simpler parser, but the result may be more maintainable as you folks are already used to maintaining a bison parser. I've gotten distracted with other things. It will probably be a bit before I can work on this more. It might be good to have a common parser that is used both from script and C++ side, for things like Inspector JSON transport (http://trac.webkit.org/wiki/ProposedWebInspectorRearchitecting). I am interested in helping out with this patch, but may need guidance :) Is there any help that is required in getting this to trunk ? Please do let me know. (In reply to comment #14) > Is there any help that is required in getting this to trunk ? > Please do let me know. The existing patches need to be fixed and attached for review. See this web page for more details about the process (except in this case, you have some patches to start from): http://webkit.org/coding/contributing.html Per above comments, the consensus seems to be that the existing patches go in the wrong direction, and we'd prefer not to have a custom parser. > Per above comments, the consensus seems to be that the existing patches go in
> the wrong direction, and we'd prefer not to have a custom parser.
Yes, I was working on a Yacc-based parser (which is what we use for JavaScript), but I got distracted by other things. Please feel free to steal this bug. :)
Created attachment 28827 [details]
Patch to Add JSON Object
The patch include the changes for adding JSON object and test cases, There are few thing on the code could be changed.
1. Build the object while parsing instead of the syntax tree.
2. Move some of the code in JSONLexer to common object so it can be used by both JS lexer and JSON.
I'll try to review this soon. It would be good to review/land this soon, as we will need a JSON parser for bug 25419. Native JSON support was recently added to V8. It might be worthwhile cross-testing. Here are some tests from V8: http://code.google.com/p/v8/source/browse/trunk/test/mjsunit/json.js Comment on attachment 28827 [details]
Patch to Add JSON Object
This patch uses tabs, which is against the style guidelines (and will cause the commit to fail).
Please remove the tabs and re-post. Also, I think we could review and land the tests first and then review the code. Or at least I think they should be reviewed separately to make the review smaller.
r- for the tabs.
I will remove tabs and post separate patch for the test cases. But it looks like my changes not compatible with latest version of the source code, so I need to fix them first. Created attachment 30293 [details]
JSON test cases pacth
Test cases for JSON object, I will try and include google test case
Created attachment 30295 [details]
Patch to add JSON object - remove tabs
Tab is now removed from the source files, Makefiles still have the tabs because it's required
I have been looking at the JSON lexer grammar recently and have realised the the specified JSON lexer is much more restrictive than the standard lexer used by JS. This means that we really need to consider whether we use a strict json lexer or the much more forgiving JS lexer in JSC, using the strict JSON lexer seems "nice", but the standard json2.js library just uses eval which could mean that the strict lexer will result in failures that are not present in when using the non-native eval based implementation. (In reply to comment #26) > the standard json2.js library just uses eval which could mean > that the strict lexer will result in failures that are not present in when > using the non-native eval based implementation. I recommend testing Firefox 3.5 and IE8 to see how their lexer behaves. We should match them as closely as possible. > I recommend testing Firefox 3.5 and IE8 to see how their lexer behaves. We
> should match them as closely as possible.
>
I did test firefox 3.5 and they are using strict lexer, but I'm not sure about IE8 I don't realy use windows.
I had look at json2.js it looks like the code use regular expression on top of eval for restriction but one thing is not clear wither it allows use of single quote with members name (i.e {'x':1}).
> I had look at json2.js it looks like the code use regular expression on top of
> eval for restriction but one thing is not clear wither it allows use of single
> quote with members name (i.e {'x':1}).
json2 is much more lax than we want to be. We should be as strict as IE8 and Firefox 3.5.
Comment on attachment 30293 [details]
JSON test cases pacth
Okay, while in general i like the tests this patch does i'd like the testcases to use our "standard" js testcase model then tryBackend becomes
shouldBe("JSON2.stringify(JSON2.parse(text))", "JSON.stringify(JSON.parse(text))"), however this is optional (for now ;) )
I'd also like a few tests to cover large inputs, as well as use of the replacement/reviver functions, specifically to test
* which reviver functions get called; and
* the order in which reviver functions are called
I'll r- this patch due to the additional testing that i'd like, but by and large i like what's currently here, it just needs to be more complete.
Comment on attachment 30295 [details]
Patch to add JSON object - remove tabs
I don't like this patch, while the JSON object itself is fine, i don't believe it is necessary to use a yacc parser -- given the simplicity of the JSON grammar i believe a hand coded recursive descent parser is likely to be faster, with a little care i believe this will also make it easier to have the parser avoid ever creating an AST in the case where we do not have custom reviver functions.
(Addtionally this particular grammar tracks a lot of information that we track in the normal JS grammar which is of no value to the JSON parser) -- I'm also not entirely sure i like the lexer, but certainly i prefer it over the grammar implementation. We can in fact probably just use the literal parser we preflight eval with to achieve this.
Sorry for the delay in reviewing.
If you want to discuss more directly i tend to be on #webkit as olliej fairly continuously.
I agree with your comments but yhe use of bison parser instead of custom parser was based on the Comment #10 From Maciej Stachowiak and Comment #16 From Alexey Proskuryakov. (In reply to comment #31) > (From update of attachment 30295 [details] [review]) > I don't like this patch, while the JSON object itself is fine, i don't believe > it is necessary to use a yacc parser -- given the simplicity of the JSON > grammar i believe a hand coded recursive descent parser is likely to be faster, > with a little care i believe this will also make it easier to have the parser > avoid ever creating an AST in the case where we do not have custom reviver > functions. > > (Addtionally this particular grammar tracks a lot of information that we track > in the normal JS grammar which is of no value to the JSON parser) -- I'm also > not entirely sure i like the lexer, but certainly i prefer it over the grammar > implementation. We can in fact probably just use the literal parser we > preflight eval with to achieve this. > > Sorry for the delay in reviewing. > > If you want to discuss more directly i tend to be on #webkit as olliej fairly > continuously. > It's true that Alexey and I originally suggested that. But seeing the patch, and in light of Oliver's recent fast literal parser, I think Oliver is correct. I've locally got the literal parser to the point where it is conformant with JSON spec behaviour, and i'm working on getting Husam's JSON object hooked up to it. Will put initial patch up in a few hours |