Bug 20031

Summary: Implement ES 3.1 JSON object
Product: WebKit Reporter: Adam Barth <abarth>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, arv, bradneuberg, cedricv, ddkilzer, dglazkov, emacemac7, erights, gavin.sharp, ggaren, marcus, mike, mjs, ml, ojan, rik, sam, sriram.neelakandan, timothy, tony, webmaster, zapperlott, zwarich
Priority: P2 Keywords: ES5
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
URL: http://wiki.ecmascript.org/doku.php?id=es3.1:es3.1_proposal_working_draft
Bug Depends on: 26249, 26587, 26592    
Bug Blocks: 25419    
Attachments:
Description Flags
Work-in-progress: Step 1: Add skeleton JSON object
none
Work-in-progress: Step 2: Beginnings of JSON parser
none
Work-in-progress: Step 2: Beginnings of JSON parser
none
Work-in-progress: Step 2: JSON parser front-end
none
Work-in-progress: Step 3: Parser frond-end tests
none
Work-in-progress: Step 4: Parser back-end with tests
none
Patch to Add JSON Object
eric: review-
JSON test cases pacth
oliver: review-
Patch to add JSON object - remove tabs oliver: review-

Description Adam Barth 2008-07-13 21:29:43 PDT
ECMAScript 3.1 provides support for native parsing and stringifying of JSON objects.  It looks like IE8 and Firefox 3.1 are going to support this as well.  Maciej wrote on es4-discuss that we would gladly accept patches for ES 3.1 features.  Let's get to it!
Comment 1 Adam Barth 2008-07-13 21:31:15 PDT
This will likely take more that one patch, so it might make sense to use this as a meta bug.  I'm interested in making this happen, so I thought I'd start with the easy bits.  :)
Comment 2 Adam Barth 2008-07-14 02:53:05 PDT
Created attachment 22263 [details]
Work-in-progress: Step 1: Add skeleton JSON object

I'm not sure if we want to start landing these patches until we get some level of basic functionality.
Comment 3 Adam Barth 2008-07-14 02:55:11 PDT
Created attachment 22264 [details]
Work-in-progress: Step 2: Beginnings of JSON parser

This parser knows how to eat JSON and how not to eat non-JSON.  The next step is to actually build a JSObject.
Comment 4 Adam Barth 2008-07-14 03:10:27 PDT
Created attachment 22265 [details]
Work-in-progress: Step 2: Beginnings of JSON parser

Fix a few bugs.
Comment 5 Adam Barth 2008-07-14 15:54:44 PDT
Created attachment 22271 [details]
Work-in-progress: Step 2: JSON parser front-end

Here is a reasonable front-end for the JSON parser.  We're still missing the back end.
Comment 6 Adam Barth 2008-07-14 15:57:32 PDT
Created attachment 22272 [details]
Work-in-progress: Step 3: Parser frond-end tests

Tests for the parser front end.  These test cases come from Firefox.  Should these tests be in the JavaScriptCore/tests directory?  I wasn't sure (1) how to add tests there or (2) how to get these exact characters to throw at the parser without XMLHttpRequest.
Comment 7 Sam Weinig 2008-07-14 17:16:02 PDT
Comment on attachment 22263 [details]
Work-in-progress: Step 1: Add skeleton JSON object

We usually don't use a static property map in JSC for things like this, and instead put the functions directly in the Object it self, see DateConstructor for an example of how do this.
Comment 8 Adam Barth 2008-07-15 01:01:51 PDT
Created attachment 22277 [details]
Work-in-progress: Step 4: Parser back-end with tests

Here is a back-end for the parser.  Sam, I'll address your comment next time I rebase the whole patch stream.  I'm going to ask Doug if he has more test cases.

Here is the Firefox native JSON bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=408838
Comment 9 Alexey Proskuryakov 2008-07-15 02:46:51 PDT
Comment on attachment 22271 [details]
Work-in-progress: Step 2: JSON parser front-end

I didn't attempt to really review this (and the patch is not up for review anyway :) ), but here are some comments I had while skimming over.

+static inline void ws(State* state)

This definitely needs a better name, and ideally, should share code with other isWhitespace/skipWhitespace functions we have (e.g. Lexer::isWhiteSpace, isASCIIWhitespace).

+    return paddedLiteral(state, L'[');

On the Mac, wchar_t is 32-bit, so we cannot use this type and its associated compiler features.

+    static const UChar falseLiteral[] = { 'f', 'a', 'l', 's', 'e', '\0' };

I'm not 100% sure, but I suspect that this array may be initialized at runtime, not compile time. If so, this initialization is not thread safe.

+        while(digit(state))

There should be a space after while.

A general question: what is the reason for writing a custom parser, and not using bison?
Comment 10 Maciej Stachowiak 2008-07-15 03:27:38 PDT
I discussed this with Sam, Geoff and Oliver at the office today. Most of us agreed that using the existing JavaScriptCore lexer along with a bison grammar would be a better approach than an ad-hoc custom parser. Geoff also argued that it would be better to use the full JS parser and then study the syntax tree to ensure that it has no JSON violations, but the rest of us thought the existing lexer plus a custom grammar would work better. (In theory though, I guess it could create the same kinds of AST nodes as the full parser and thus the back end could just consist of compiling and executing the resulting syntax tree. Or alternately the grammar could directly create JS values and combine them, but there might be GC problems with this approach).
Comment 11 Adam Barth 2008-07-17 03:59:29 PDT
> I discussed this with Sam, Geoff and Oliver at the office today. Most of us
> agreed that using the existing JavaScriptCore lexer along with a bison grammar
> would be a better approach than an ad-hoc custom parser.

Unfortunately, we can't reuse the existing lexer because JSON has different lexing rules.  For example, JSON only supports some of JavaScript string escape sequences.

> Geoff also argued that
> it would be better to use the full JS parser and then study the syntax tree to
> ensure that it has no JSON violations, but the rest of us thought the existing
> lexer plus a custom grammar would work better.

Just looking at the syntax tree is far too late.  There are many requirements of the JSON RFC that are lost at that point.

I've made some amount of progress with a bison grammar for a new lexer.  I'll post a patch once I get sufficiently far to be able to handle simple examples.  It is unclear whether this approach will yield a simpler parser, but the result may be more maintainable as you folks are already used to maintaining a bison parser.
Comment 12 Adam Barth 2008-07-24 00:06:00 PDT
I've gotten distracted with other things.  It will probably be a bit before I can work on this more.
Comment 13 Dimitri Glazkov (Google) 2008-10-15 07:28:52 PDT
It might be good to have a common parser that is used both from script and C++ side, for things like Inspector JSON transport (http://trac.webkit.org/wiki/ProposedWebInspectorRearchitecting).

I am interested in helping out with this patch, but may need guidance :)
Comment 14 Sriram Neelakandan 2008-12-11 00:51:12 PST
Is there any help that is required in getting this to trunk ?
Please do let me know.
Comment 15 David Kilzer (:ddkilzer) 2008-12-11 06:26:19 PST
(In reply to comment #14)
> Is there any help that is required in getting this to trunk ?
> Please do let me know.

The existing patches need to be fixed and attached for review.  See this web page for more details about the process (except in this case, you have some patches to start from):

http://webkit.org/coding/contributing.html
Comment 16 Alexey Proskuryakov 2008-12-11 11:01:43 PST
Per above comments, the consensus seems to be that the existing patches go in the wrong direction, and we'd prefer not to have a custom parser.
Comment 17 Adam Barth 2008-12-11 11:09:55 PST
> Per above comments, the consensus seems to be that the existing patches go in
> the wrong direction, and we'd prefer not to have a custom parser.

Yes, I was working on a Yacc-based parser (which is what we use for JavaScript), but I got distracted by other things.  Please feel free to steal this bug.  :)
Comment 18 Husam Senussi 2009-03-21 19:02:33 PDT
Created attachment 28827 [details]
Patch to Add JSON Object

The patch include  the changes for adding JSON object and test cases, There are few thing on the code could be changed.

1. Build the object while parsing instead of the syntax tree.
2. Move some of the code in JSONLexer to common object so it can be used by both JS lexer and JSON.
Comment 19 Cameron Zwarich (cpst) 2009-03-29 09:50:54 PDT
I'll try to review this soon.
Comment 20 Timothy Hatcher 2009-04-30 09:00:27 PDT
It would be good to review/land this soon, as we will need a JSON parser for bug 25419.
Comment 21 Adam Barth 2009-04-30 09:54:05 PDT
Native JSON support was recently added to V8.  It might be worthwhile cross-testing.  Here are some tests from V8:

http://code.google.com/p/v8/source/browse/trunk/test/mjsunit/json.js
Comment 22 Eric Seidel (no email) 2009-05-11 05:58:49 PDT
Comment on attachment 28827 [details]
Patch to Add JSON Object

This patch uses tabs, which is against the style guidelines (and will cause the commit to fail).

Please remove the tabs and re-post.  Also, I think we could review and land the tests first and then review the code.  Or at least I think they should be reviewed separately to make the review smaller.

r- for the tabs.
Comment 23 Husam Senussi 2009-05-11 15:18:14 PDT
I will remove tabs and post separate  patch for the test cases.

But it looks like  my changes not compatible with latest version of the source code, so I need to fix them first.

 
Comment 24 Husam Senussi 2009-05-13 16:01:15 PDT
Created attachment 30293 [details]
JSON test cases pacth

Test cases for JSON object, I will try and include google test case
Comment 25 Husam Senussi 2009-05-13 16:28:19 PDT
Created attachment 30295 [details]
Patch to add JSON object - remove tabs

Tab is now removed from the source files, Makefiles still have the tabs because it's required
Comment 26 Oliver Hunt 2009-05-19 23:13:08 PDT
I have been looking at the JSON lexer grammar recently and have realised the the specified JSON lexer is much more restrictive than the standard lexer used by JS.  This means that we really need to consider whether we use a strict json lexer or the much more forgiving JS lexer in JSC, using the strict JSON lexer seems "nice", but the standard json2.js library just uses eval which could mean that the strict lexer will result in failures that are not present in when using the non-native eval based implementation.
Comment 27 Adam Barth 2009-05-19 23:44:47 PDT
(In reply to comment #26)
> the standard json2.js library just uses eval which could mean
> that the strict lexer will result in failures that are not present in when
> using the non-native eval based implementation.

I recommend testing Firefox 3.5 and IE8 to see how their lexer behaves.  We should match them as closely as possible.
Comment 28 Husam Senussi 2009-05-20 02:20:09 PDT
> I recommend testing Firefox 3.5 and IE8 to see how their lexer behaves.  We
> should match them as closely as possible.
> 

I did test firefox 3.5 and they are using strict lexer, but I'm not sure about IE8 I don't realy use windows.

I had look at json2.js it looks like the code use regular expression on top of  eval for restriction but one thing is not clear wither it allows use of single quote with members name (i.e  {'x':1}).
Comment 29 Adam Barth 2009-05-20 09:16:03 PDT
> I had look at json2.js it looks like the code use regular expression on top of 
> eval for restriction but one thing is not clear wither it allows use of single
> quote with members name (i.e  {'x':1}).

json2 is much more lax than we want to be.  We should be as strict as IE8 and Firefox 3.5.
Comment 30 Oliver Hunt 2009-05-22 19:42:03 PDT
Comment on attachment 30293 [details]
JSON test cases pacth

Okay, while in general i like the tests this patch does i'd like the testcases to use our "standard" js testcase model then tryBackend becomes
shouldBe("JSON2.stringify(JSON2.parse(text))", "JSON.stringify(JSON.parse(text))"), however this is optional (for now ;) )

I'd also like a few tests to cover large inputs, as well as use of the replacement/reviver functions, specifically to test
 * which reviver functions get called; and
 * the order in which reviver functions are called

I'll r- this patch due to the additional testing that i'd like, but by and large i like what's currently here, it just needs to be more complete.
Comment 31 Oliver Hunt 2009-05-22 19:50:36 PDT
Comment on attachment 30295 [details]
Patch to add JSON object - remove tabs

I don't like this patch, while the JSON object itself is fine, i don't believe it is necessary to use a yacc parser -- given the simplicity of the JSON grammar i believe a hand coded recursive descent parser is likely to be faster, with a little care i believe this will also make it easier to have the parser avoid ever creating an AST in the case where we do not have custom reviver functions.

(Addtionally this particular grammar tracks a lot of information that we track in the normal JS grammar which is of no value to the JSON parser) -- I'm also not entirely sure i like the lexer, but certainly i prefer it over the grammar implementation.  We can in fact probably just use the literal parser we preflight eval with to achieve this.

Sorry for the delay in reviewing.

If you want to discuss more directly i tend to be on #webkit as olliej fairly continuously.
Comment 32 Husam Senussi 2009-05-23 10:00:41 PDT
I agree with your comments but yhe use of bison parser instead of custom parser was based on the Comment #10 From Maciej Stachowiak and Comment #16 From Alexey Proskuryakov.

(In reply to comment #31)
> (From update of attachment 30295 [details] [review])
> I don't like this patch, while the JSON object itself is fine, i don't believe
> it is necessary to use a yacc parser -- given the simplicity of the JSON
> grammar i believe a hand coded recursive descent parser is likely to be faster,
> with a little care i believe this will also make it easier to have the parser
> avoid ever creating an AST in the case where we do not have custom reviver
> functions.
> 
> (Addtionally this particular grammar tracks a lot of information that we track
> in the normal JS grammar which is of no value to the JSON parser) -- I'm also
> not entirely sure i like the lexer, but certainly i prefer it over the grammar
> implementation.  We can in fact probably just use the literal parser we
> preflight eval with to achieve this.
> 
> Sorry for the delay in reviewing.
> 
> If you want to discuss more directly i tend to be on #webkit as olliej fairly
> continuously.
> 

Comment 33 Maciej Stachowiak 2009-05-24 04:30:41 PDT
It's true that Alexey and I originally suggested that. But seeing the patch, and in light of Oliver's recent fast literal parser, I think Oliver is correct.
Comment 34 Oliver Hunt 2009-06-03 16:48:07 PDT
I've locally got the literal parser to the point where it is conformant with JSON spec behaviour, and i'm working on getting Husam's JSON object hooked up to it.

Will put initial patch up in a few hours 
Comment 35 Oliver Hunt 2009-06-22 22:49:50 PDT
JSON is now complete, made up with
r44550
r44813
r44923
r44924
r44929
r44931
r44968
r44974