Bug 63340

Summary: Lexer error messages are not very informative
Product: WebKit Reporter: Oliver Hunt <oliver>
Component: JavaScriptCoreAssignee: Juan C. Montemayor <j.mont>
Status: RESOLVED FIXED    
Severity: Normal CC: dglazkov, ggaren, oliver, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ec2-cr-linux-03
none
Patch
none
Proposed patch
none
fixed patch
none
proposed patch
none
revised patch
none
patch none

Description Oliver Hunt 2011-06-24 12:12:46 PDT
Current lexer error messages basically produce a dump of the last sequence of characters seen by the lexer, without any indication of why those characters are wrong, it would be nice to get errors along the lines of

"blah blah blah
=>
"Unterminated string literal"

"use strict"; "\03"
=>
"Octal escapes are forbidden in strict mode"

etc
Comment 1 Juan C. Montemayor 2011-06-25 13:16:37 PDT
Created attachment 98602 [details]
Patch
Comment 2 Juan C. Montemayor 2011-06-25 13:18:36 PDT
(In reply to comment #1)
> Created an attachment (id=98602) [details]
> Patch

For the javascritptcore-tests this patch causes no regressions in debug mode but causes two regressions in release mode. I am currently re-checking my work to find out why, but if anyone has any ideas...
Comment 3 Juan C. Montemayor 2011-06-25 13:28:21 PDT
(In reply to comment #2)
> (In reply to comment #1)
> > Created an attachment (id=98602) [details] [details]
> > Patch
> 
> For the javascritptcore-tests this patch causes no regressions in debug mode but causes two regressions in release mode. I am currently re-checking my work to find out why, but if anyone has any ideas...

Also, there are still some junk test outputs in the patch... I'll remove those once I fix the regression.
Comment 4 WebKit Review Bot 2011-06-25 14:36:56 PDT
Comment on attachment 98602 [details]
Patch

Attachment 98602 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8932894

New failing tests:
transitions/transition-end-event-transform.html
http/tests/media/reload-after-dialog.html
http/tests/media/pdf-served-as-pdf.html
Comment 5 WebKit Review Bot 2011-06-25 14:37:00 PDT
Created attachment 98607 [details]
Archive of layout-test-results from ec2-cr-linux-03

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-03  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 6 Oliver Hunt 2011-06-25 14:56:56 PDT
Comment on attachment 98602 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98602&action=review

> Source/JavaScriptCore/parser/Lexer.cpp:531
> +                    m_lexErrorMessage = "Only quote characters are allowed after \\u";

\u can only be followed by a unicode character number, or a closing quote

> Source/JavaScriptCore/parser/Lexer.cpp:582
> +                m_lexErrorMessage = "New-line or end of input is not allowed here";

Unexpected EOF

> Source/JavaScriptCore/parser/Lexer.cpp:1119
> +        m_lexErrorMessage = "Invalid character";

Should say what the character is -- and should possibly escape it if it's a control character.

> LayoutTests/fast/js/basic-strict-mode-expected.txt:184
> +PASS 'use strict';5.f threw exception SyntaxError: Identifiers cannot start with a number.

This error is bogus -- "Only digits may follow a decimal point" might be ebtter

> LayoutTests/fast/js/cyclic-proto-expected.txt:1
> +posix_spawn fatal error: 9!

This change is bogus
Comment 7 Juan C. Montemayor 2011-06-25 16:55:43 PDT
> > Source/JavaScriptCore/parser/Lexer.cpp:1119
> > +        m_lexErrorMessage = "Invalid character";
> 
> Should say what the character is -- and should possibly escape it if it's a control character.

So, I'm getting weird character values for this case... I'm getting (as int) 8205 for the removing-Cf-characters test, 247 for the parse test, and 827 for basic-strict-mode. These aren't ascii, amirite?

In any case, I get these errors - 
'use strict'Ì» threw exception SyntaxError: Invalid character ;.
(function(){'use strict'Ì»}) threw exception SyntaxError: Invalid character ;.
'use strict';Ì» threw exception SyntaxError: Invalid character ;.
(function(){'use strict';Ì»}) threw exception SyntaxError: Invalid character ;.

In the first two, there aren't any semicolons in the code, and in the second ones, I'm pretty sure what fails isn't the semicolon.

I tried not outputting any characters if they weren't ascii, and null terminating those that needed null terminating like so:

    case CharacterInvalid:
        if (m_current < 33)
            m_lexErrorMessage = String::format("Invalid character \%i",m_current).impl();
        else if (m_current > 127)
            m_lexErrorMessage = "Invalid character";
        else
            m_lexErrorMessage = String::format("Invalid character %i",m_current).impl();
        goto returnError;

but then the tests just showed "Invalid character" as output. Any recommendations?

Also, what should I do about the regressions in release mode?
Comment 8 Oliver Hunt 2011-06-25 17:18:19 PDT
(In reply to comment #7)
> > > Source/JavaScriptCore/parser/Lexer.cpp:1119
> > > +        m_lexErrorMessage = "Invalid character";
> > 
> > Should say what the character is -- and should possibly escape it if it's a control character.
> 
> So, I'm getting weird character values for this case... I'm getting (as int) 8205 for the removing-Cf-characters test, 247 for the parse test, and 827 for basic-strict-mode. These aren't ascii, amirite?
> 
> In any case, I get these errors - 
> 'use strict'Ì» threw exception SyntaxError: Invalid character ;.
> (function(){'use strict'Ì»}) threw exception SyntaxError: Invalid character ;.
> 'use strict';Ì» threw exception SyntaxError: Invalid character ;.
> (function(){'use strict';Ì»}) threw exception SyntaxError: Invalid character ;.
> 
> In the first two, there aren't any semicolons in the code, and in the second ones, I'm pretty sure what fails isn't the semicolon.
> 
> I tried not outputting any characters if they weren't ascii, and null terminating those that needed null terminating like so:
> 
>     case CharacterInvalid:
>         if (m_current < 33)
>             m_lexErrorMessage = String::format("Invalid character \%i",m_current).impl();
>         else if (m_current > 127)
>             m_lexErrorMessage = "Invalid character";
>         else
>             m_lexErrorMessage = String::format("Invalid character %i",m_current).impl();
>         goto returnError;
> 
> but then the tests just showed "Invalid character" as output. Any recommendations?
In the invlaid character case i'd just do escaped unicode (%04u i think should do it), specialising for \t, \r, \n, \v, etc
> 
> Also, what should I do about the regressions in release mode?

Work out what's causing them -- at a guess you're probably relying on an uninitialised value
Comment 9 Juan C. Montemayor 2011-06-26 00:27:28 PDT
> > but then the tests just showed "Invalid character" as output. Any recommendations?
> In the invlaid character case i'd just do escaped unicode (%04u i think should do it), specialising for \t, \r, \n, \v, etc

Using %04u I get this output:

PASS var ZWJ_Iâ€nside; threw exception SyntaxError: Invalid character 8205.

Is that fine?

> > 
> > Also, what should I do about the regressions in release mode?
> 
> Work out what's causing them -- at a guess you're probably relying on an uninitialised value

Brilliant! Fixed.
Comment 10 Oliver Hunt 2011-06-26 10:52:18 PDT
(In reply to comment #9)
> > > but then the tests just showed "Invalid character" as output. Any recommendations?
> > In the invlaid character case i'd just do escaped unicode (%04u i think should do it), specialising for \t, \r, \n, \v, etc
> 
> Using %04u I get this output:
> 
> PASS var ZWJ_Iâ€nside; threw exception SyntaxError: Invalid character 8205.
> 
> Is that fine?

Having thought about it maybe we want to do

printf("Invalid character %C (\\u%04u)", m_current, m_current);

But i'm not sure if Stirng::format supports %C

> 
> > > 
> > > Also, what should I do about the regressions in release mode?
> > 
> > Work out what's causing them -- at a guess you're probably relying on an uninitialised value
> 
> Brilliant! Fixed.

\o/
Comment 11 Juan C. Montemayor 2011-06-26 23:13:51 PDT
> > Using %04u I get this output:
> > 
> > PASS var ZWJ_Iâ€nside; threw exception SyntaxError: Invalid character 8205.
> > 
> > Is that fine?
> 
> Having thought about it maybe we want to do
> 
> printf("Invalid character %C (\\u%04u)", m_current, m_current);
> 
> But i'm not sure if Stirng::format supports %C
> 

I checked, and it doesn't... I'll add the \\u before the unicode, and figure out how I can implement the %C
Comment 12 Oliver Hunt 2011-06-27 08:45:25 PDT
(In reply to comment #11)
> > > Using %04u I get this output:
> > > 
> > > PASS var ZWJ_Iâ€nside; threw exception SyntaxError: Invalid character 8205.
> > > 
> > > Is that fine?
> > 
> > Having thought about it maybe we want to do
> > 
> > printf("Invalid character %C (\\u%04u)", m_current, m_current);
> > 
> > But i'm not sure if Stirng::format supports %C
> > 
> 
> I checked, and it doesn't... I'll add the \\u before the unicode, and figure out how I can implement the %C

Brainfart, just use %04d :d
Comment 13 Juan C. Montemayor 2011-06-27 12:48:54 PDT
Created attachment 98769 [details]
Patch

Should I run any other tests other than webkit-tests and javascriptcore-tests in both debug and release mode?
Comment 14 Oliver Hunt 2011-06-27 13:08:51 PDT
Comment on attachment 98769 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98769&action=review

Other than the above comments this looks good.  Once this is done we can move to JSON error messages or getting line a character information :D

> Source/JavaScriptCore/parser/Lexer.cpp:531
> +                    m_lexErrorMessage = "\\u can only be followed by a unicode character number, or a closing quote";

s/number/sequence i think would be a better wording

> Source/JavaScriptCore/parser/Lexer.cpp:569
> +                m_lexErrorMessage = "Problem reading character sequence";

I believe this should be "Unterminated string constant"

> Source/JavaScriptCore/parser/Lexer.cpp:1119
> +        m_lexErrorMessage = String::format("Invalid character (\\u%04u)", m_current).impl();

I think this should be a function for creating the error message so that simple characters that are invalid are displayed in a user friendly way, eg:
`, @, #, \n, \r, \v, \0

The rest can be shown as a unicode escape
Comment 15 Oliver Hunt 2011-06-27 13:09:50 PDT
Comment on attachment 98769 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98769&action=review

Other than the above comments this looks good.  Once this is done we can move to JSON error messages or getting line a character information :D

>> Source/JavaScriptCore/parser/Lexer.cpp:531
>> +                    m_lexErrorMessage = "\\u can only be followed by a unicode character number, or a closing quote";
> 
> s/number/sequence i think would be a better wording

s/number/sequence i think would be a better wording

>> Source/JavaScriptCore/parser/Lexer.cpp:569
>> +                m_lexErrorMessage = "Problem reading character sequence";
> 
> I believe this should be "Unterminated string constant"

I believe this should be "Unterminated string constant"

>> Source/JavaScriptCore/parser/Lexer.cpp:1119
>> +        m_lexErrorMessage = String::format("Invalid character (\\u%04u)", m_current).impl();
> 
> I think this should be a function for creating the error message so that simple characters that are invalid are displayed in a user friendly way, eg:
> `, @, #, \n, \r, \v, \0
> 
> The rest can be shown as a unicode escape

I think this should be a function for creating the error message so that simple characters that are invalid are displayed in a user friendly way, eg:
`, @, #, \n, \r, \v, \0

The rest can be shown as a unicode escape
Comment 16 Oliver Hunt 2011-06-27 13:10:18 PDT
Sorry for the noise, review screen seemed to go nuts on me :(
Comment 17 Juan C. Montemayor 2011-06-29 19:02:26 PDT
Created attachment 99208 [details]
Proposed patch

fixed wording of some errors and added function for creating the error message
Comment 18 Juan C. Montemayor 2011-06-29 19:26:34 PDT
Created attachment 99212 [details]
fixed patch

Fixed tab and also added file that will hopefully not break cr-linux :)
Comment 19 Juan C. Montemayor 2011-06-29 19:33:51 PDT
Comment on attachment 99212 [details]
fixed patch

my bad, accidentally uploaded this patch to this bug. I'll try to find out why the win bot is failing on my other patch
Comment 20 Juan C. Montemayor 2011-06-30 11:03:26 PDT
Created attachment 99335 [details]
proposed patch

fixed the issue that made the win bot fail
Comment 21 Geoffrey Garen 2011-06-30 14:05:52 PDT
Comment on attachment 99335 [details]
proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99335&action=review

Looking good -- the new error messages are a huge improvement. But I think this patch could be even better.

> Source/JavaScriptCore/parser/Lexer.cpp:237
> +ALWAYS_INLINE UString Lexer::getInvalidCharMessage()

I don't think you want to inline this -- it's the error case, so it should be out of line, for smaller code generation and less register pressure and instruction cache pollution in the non-error path.

> Source/JavaScriptCore/parser/Lexer.cpp:241
> +        return UString("Invalid character '\\0'").impl();

Since your return type is UString, you can just return a C string ("Invalid..."), and the UString constructor will do the rest. This construct you've used will create two UStrings for every call, instead of one.

> Source/JavaScriptCore/parser/Lexer.cpp:243
> +        return UString("Invalid character '\\n'").impl();

I think these messages would be clearer with a colon: "Invalid character: ...").

> Source/JavaScriptCore/parser/Lexer.cpp:554
> +                    m_lexErrorMessage = "\\u can only be followed by a unicode character sequence";

Unicode is a proper noun, so please capitalize it.

> Source/JavaScriptCore/parser/Lexer.cpp:562
> +                    m_lexErrorMessage = "Only valid numeric escape in strict mode is '\\0'";

To make this a complete sentence, say "The only valid..."

> Source/JavaScriptCore/parser/Lexer.cpp:1296
> +    m_lexErrorMessage = UString("Lexer error");

No need for "UString" here -- the constructor will do that for you.
Comment 22 Juan C. Montemayor 2011-06-30 16:47:13 PDT
Created attachment 99397 [details]
revised patch

Fixed details mentioned by Geoff
Comment 23 Oliver Hunt 2011-06-30 18:09:18 PDT
Comment on attachment 99397 [details]
revised patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99397&action=review

> Source/JavaScriptCore/parser/Lexer.cpp:253
> +        return "Invalid character: '\\#'";
> +    case 64:
> +        return "Invalid character: '\\@'";
> +    case 96:
> +        return "Invalid character: '\\`'";

These characters don't need to be escaped, so shouldn't be :-(

> Source/JavaScriptCore/parser/Lexer.cpp:1108
> +            m_lexErrorMessage = "Only digits may follow a decimal point";

I think this would be better as "At least one digit must occur after a decimal point"

> Source/JavaScriptCore/parser/Lexer.cpp:1296
> +    m_lexErrorMessage = "Lexer error";

Why have a string here at all in the clean state?

> Source/JavaScriptCore/parser/Lexer.h:78
> +            m_lexErrorMessage = "Lexer error";

Default error message should be empty.  By doing this assignment I believe you're adding at least two memory allocations and a memcpy to something we expect to be very fast.

> Source/JavaScriptCore/parser/Parser.cpp:54
> +    UString lexErrorMessage = lexError ? lexer.getErrorMessage() : UString();

I would do
lexErrorMessage = lexer.getErrorMessage();
ASSERT(lexErrorMessage.isNull() != lexError); // If there's been a lexer error there should be a message, if there hasn't been, there should not be a message.
Comment 24 Juan C. Montemayor 2011-07-01 10:53:45 PDT
Created attachment 99487 [details]
patch

updated patch based on Oliver's comments
Comment 25 WebKit Review Bot 2011-07-01 10:55:50 PDT
Attachment 99487 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/update-webkit', '--chromium']" exit_code: 2

Updating OpenSource
Current branch master is up to date.
Updating chromium port dependencies using gclient...

________ running '/usr/bin/python gyp_webkit' in '/mnt/git/webkit-style-queue/Source/WebKit/chromium'
Updating webkit projects from gyp files...
TError: /usr/bin/python gyp_webkit in /mnt/git/webkit-style-queue/Source/WebKit/chromium returned 1
raceback (most recent call last):
  File "gyp_webkit", line 117, in <module>
    sys.exit(gyp.main(args))
  File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/__init__.py", line 448, in main
    options.circular_check)
  File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/__init__.py", line 87, in Load
    depth, generator_input_info, check, circular_check)
  File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/input.py", line 2210, in Load
    depth, check)
  File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/input.py", line 421, in LoadTargetBuildFile
    includes, depth, check)
  File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/input.py", line 421, in LoadTargetBuildFile
    includes, depth, check)
  File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/input.py", line 355, in LoadTargetBuildFile
    includes, True, check)
  File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/input.py", line 210, in LoadOneBuildFile
    raise Exception("%s not found (cwd: %s)" % (build_file_path, os.getcwd()))
Exception: ../../WebKit/chromium/sql/sql.gyp not found (cwd: /mnt/git/webkit-style-queue/Source/WebKit/chromium) while loading dependencies of ../../WebKit/chromium/webkit/support/webkit_support.gyp while loading dependencies of WebKit.gyp while trying to load WebKit.gyp
Died at Tools/Scripts/update-webkit-chromium line 69.
No such file or directory at Tools/Scripts/update-webkit line 100.


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 26 WebKit Review Bot 2011-07-01 12:47:19 PDT
Comment on attachment 99487 [details]
patch

Clearing flags on attachment: 99487

Committed r90265: <http://trac.webkit.org/changeset/90265>
Comment 27 WebKit Review Bot 2011-07-01 12:47:28 PDT
All reviewed patches have been landed.  Closing bug.