Bug 41844 - JavaScript parser violates ECMA automatic semicolon insertion rule
Summary: JavaScript parser violates ECMA automatic semicolon insertion rule
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-08 02:51 PDT by Kent Hansen
Modified: 2010-07-08 16:24 PDT (History)
1 user (show)

See Also:


Attachments
Patch (4.23 KB, patch)
2010-07-08 15:56 PDT, Oliver Hunt
barraclough: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kent Hansen 2010-07-08 02:51:44 PDT
The following snippet:

    JSGlobalContextRef context = JSGlobalContextCreateInGroup(NULL, NULL);
    const char *script = "if (0)";
    JSValueRef val = JSEvaluateScript(context, JSStringCreateWithUTF8CString(script), NULL, NULL, 1, NULL);
    JSStringRef str = JSValueToStringCopy(context, val, NULL);
    char buf[256];
    JSStringGetUTF8CString(str, buf, 256);
    printf("%s\n", buf);

prints "undefined".

ECMA-262 5th ed, section 7.9.1 "Rules of automatic semicolon", states: "When, as the program is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Program, then a semicolon is automatically inserted at the end of the input stream."

So far, so good.
But then the above is followed by: "However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement [...]."

When the program is "if (0)", as in the above snippet, per the above rule, a semicolon should _not_ automatically be inserted. Instead a SyntaxError should be thrown.
Comment 1 Oliver Hunt 2010-07-08 15:18:59 PDT
This isn't automatic semicolon insertion for the usual reasons eg.
{ if (0) }
will fail to parse.

The parse success is due to the lexer automatically inserting a semicolon to the end of a script if it has not seen a line terminator when it reaches the end.  I'm no sure what the specific reason for it is.
Comment 2 Oliver Hunt 2010-07-08 15:56:09 PDT
Created attachment 60976 [details]
Patch
Comment 3 Darin Adler 2010-07-08 16:10:02 PDT
Comment on attachment 60976 [details]
Patch

What’s the chance that some real world WebKit-only content depends on the broken behavior?
Comment 4 Oliver Hunt 2010-07-08 16:14:55 PDT
(In reply to comment #3)
> (From update of attachment 60976 [details])
> What’s the chance that some real world WebKit-only content depends on the broken behavior?

Fairly low -- no other browser does this, and it depends on having the incorrect code as the very last thing in the file (eg. no trailing new lines or other syntax)
Comment 5 Oliver Hunt 2010-07-08 16:24:04 PDT
Committed r62862