WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
63340
Lexer error messages are not very informative
https://bugs.webkit.org/show_bug.cgi?id=63340
Summary
Lexer error messages are not very informative
Oliver Hunt
Reported
2011-06-24 12:12:46 PDT
Current lexer error messages basically produce a dump of the last sequence of characters seen by the lexer, without any indication of why those characters are wrong, it would be nice to get errors along the lines of "blah blah blah => "Unterminated string literal" "use strict"; "\03" => "Octal escapes are forbidden in strict mode" etc
Attachments
Patch
(48.63 KB, patch)
2011-06-25 13:16 PDT
,
Juan C. Montemayor
no flags
Details
Formatted Diff
Diff
Archive of layout-test-results from ec2-cr-linux-03
(1.26 MB, application/zip)
2011-06-25 14:37 PDT
,
WebKit Review Bot
no flags
Details
Patch
(44.95 KB, patch)
2011-06-27 12:48 PDT
,
Juan C. Montemayor
no flags
Details
Formatted Diff
Diff
Proposed patch
(46.27 KB, patch)
2011-06-29 19:02 PDT
,
Juan C. Montemayor
no flags
Details
Formatted Diff
Diff
fixed patch
(5.56 KB, patch)
2011-06-29 19:26 PDT
,
Juan C. Montemayor
no flags
Details
Formatted Diff
Diff
proposed patch
(46.20 KB, patch)
2011-06-30 11:03 PDT
,
Juan C. Montemayor
no flags
Details
Formatted Diff
Diff
revised patch
(46.14 KB, patch)
2011-06-30 16:47 PDT
,
Juan C. Montemayor
no flags
Details
Formatted Diff
Diff
patch
(46.07 KB, patch)
2011-07-01 10:53 PDT
,
Juan C. Montemayor
no flags
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Juan C. Montemayor
Comment 1
2011-06-25 13:16:37 PDT
Created
attachment 98602
[details]
Patch
Juan C. Montemayor
Comment 2
2011-06-25 13:18:36 PDT
(In reply to
comment #1
)
> Created an attachment (id=98602) [details] > Patch
For the javascritptcore-tests this patch causes no regressions in debug mode but causes two regressions in release mode. I am currently re-checking my work to find out why, but if anyone has any ideas...
Juan C. Montemayor
Comment 3
2011-06-25 13:28:21 PDT
(In reply to
comment #2
)
> (In reply to
comment #1
) > > Created an attachment (id=98602) [details] [details] > > Patch > > For the javascritptcore-tests this patch causes no regressions in debug mode but causes two regressions in release mode. I am currently re-checking my work to find out why, but if anyone has any ideas...
Also, there are still some junk test outputs in the patch... I'll remove those once I fix the regression.
WebKit Review Bot
Comment 4
2011-06-25 14:36:56 PDT
Comment on
attachment 98602
[details]
Patch
Attachment 98602
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/8932894
New failing tests: transitions/transition-end-event-transform.html http/tests/media/reload-after-dialog.html http/tests/media/pdf-served-as-pdf.html
WebKit Review Bot
Comment 5
2011-06-25 14:37:00 PDT
Created
attachment 98607
[details]
Archive of layout-test-results from ec2-cr-linux-03 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: ec2-cr-linux-03 Port: Chromium Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Oliver Hunt
Comment 6
2011-06-25 14:56:56 PDT
Comment on
attachment 98602
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=98602&action=review
> Source/JavaScriptCore/parser/Lexer.cpp:531 > + m_lexErrorMessage = "Only quote characters are allowed after \\u";
\u can only be followed by a unicode character number, or a closing quote
> Source/JavaScriptCore/parser/Lexer.cpp:582 > + m_lexErrorMessage = "New-line or end of input is not allowed here";
Unexpected EOF
> Source/JavaScriptCore/parser/Lexer.cpp:1119 > + m_lexErrorMessage = "Invalid character";
Should say what the character is -- and should possibly escape it if it's a control character.
> LayoutTests/fast/js/basic-strict-mode-expected.txt:184 > +PASS 'use strict';5.f threw exception SyntaxError: Identifiers cannot start with a number.
This error is bogus -- "Only digits may follow a decimal point" might be ebtter
> LayoutTests/fast/js/cyclic-proto-expected.txt:1 > +posix_spawn fatal error: 9!
This change is bogus
Juan C. Montemayor
Comment 7
2011-06-25 16:55:43 PDT
> > Source/JavaScriptCore/parser/Lexer.cpp:1119 > > + m_lexErrorMessage = "Invalid character"; > > Should say what the character is -- and should possibly escape it if it's a control character.
So, I'm getting weird character values for this case... I'm getting (as int) 8205 for the removing-Cf-characters test, 247 for the parse test, and 827 for basic-strict-mode. These aren't ascii, amirite? In any case, I get these errors - 'use strict'Ì» threw exception SyntaxError: Invalid character ;. (function(){'use strict'Ì»}) threw exception SyntaxError: Invalid character ;. 'use strict';Ì» threw exception SyntaxError: Invalid character ;. (function(){'use strict';Ì»}) threw exception SyntaxError: Invalid character ;. In the first two, there aren't any semicolons in the code, and in the second ones, I'm pretty sure what fails isn't the semicolon. I tried not outputting any characters if they weren't ascii, and null terminating those that needed null terminating like so: case CharacterInvalid: if (m_current < 33) m_lexErrorMessage = String::format("Invalid character \%i",m_current).impl(); else if (m_current > 127) m_lexErrorMessage = "Invalid character"; else m_lexErrorMessage = String::format("Invalid character %i",m_current).impl(); goto returnError; but then the tests just showed "Invalid character" as output. Any recommendations? Also, what should I do about the regressions in release mode?
Oliver Hunt
Comment 8
2011-06-25 17:18:19 PDT
(In reply to
comment #7
)
> > > Source/JavaScriptCore/parser/Lexer.cpp:1119 > > > + m_lexErrorMessage = "Invalid character"; > > > > Should say what the character is -- and should possibly escape it if it's a control character. > > So, I'm getting weird character values for this case... I'm getting (as int) 8205 for the removing-Cf-characters test, 247 for the parse test, and 827 for basic-strict-mode. These aren't ascii, amirite? > > In any case, I get these errors - > 'use strict'Ì» threw exception SyntaxError: Invalid character ;. > (function(){'use strict'Ì»}) threw exception SyntaxError: Invalid character ;. > 'use strict';Ì» threw exception SyntaxError: Invalid character ;. > (function(){'use strict';Ì»}) threw exception SyntaxError: Invalid character ;. > > In the first two, there aren't any semicolons in the code, and in the second ones, I'm pretty sure what fails isn't the semicolon. > > I tried not outputting any characters if they weren't ascii, and null terminating those that needed null terminating like so: > > case CharacterInvalid: > if (m_current < 33) > m_lexErrorMessage = String::format("Invalid character \%i",m_current).impl(); > else if (m_current > 127) > m_lexErrorMessage = "Invalid character"; > else > m_lexErrorMessage = String::format("Invalid character %i",m_current).impl(); > goto returnError; > > but then the tests just showed "Invalid character" as output. Any recommendations?
In the invlaid character case i'd just do escaped unicode (%04u i think should do it), specialising for \t, \r, \n, \v, etc
> > Also, what should I do about the regressions in release mode?
Work out what's causing them -- at a guess you're probably relying on an uninitialised value
Juan C. Montemayor
Comment 9
2011-06-26 00:27:28 PDT
> > but then the tests just showed "Invalid character" as output. Any recommendations? > In the invlaid character case i'd just do escaped unicode (%04u i think should do it), specialising for \t, \r, \n, \v, etc
Using %04u I get this output: PASS var ZWJ_Iâ€nside; threw exception SyntaxError: Invalid character 8205. Is that fine?
> > > > Also, what should I do about the regressions in release mode? > > Work out what's causing them -- at a guess you're probably relying on an uninitialised value
Brilliant! Fixed.
Oliver Hunt
Comment 10
2011-06-26 10:52:18 PDT
(In reply to
comment #9
)
> > > but then the tests just showed "Invalid character" as output. Any recommendations? > > In the invlaid character case i'd just do escaped unicode (%04u i think should do it), specialising for \t, \r, \n, \v, etc > > Using %04u I get this output: > > PASS var ZWJ_Iâ€nside; threw exception SyntaxError: Invalid character 8205. > > Is that fine?
Having thought about it maybe we want to do printf("Invalid character %C (\\u%04u)", m_current, m_current); But i'm not sure if Stirng::format supports %C
> > > > > > > Also, what should I do about the regressions in release mode? > > > > Work out what's causing them -- at a guess you're probably relying on an uninitialised value > > Brilliant! Fixed.
\o/
Juan C. Montemayor
Comment 11
2011-06-26 23:13:51 PDT
> > Using %04u I get this output: > > > > PASS var ZWJ_Iâ€nside; threw exception SyntaxError: Invalid character 8205. > > > > Is that fine? > > Having thought about it maybe we want to do > > printf("Invalid character %C (\\u%04u)", m_current, m_current); > > But i'm not sure if Stirng::format supports %C >
I checked, and it doesn't... I'll add the \\u before the unicode, and figure out how I can implement the %C
Oliver Hunt
Comment 12
2011-06-27 08:45:25 PDT
(In reply to
comment #11
)
> > > Using %04u I get this output: > > > > > > PASS var ZWJ_Iâ€nside; threw exception SyntaxError: Invalid character 8205. > > > > > > Is that fine? > > > > Having thought about it maybe we want to do > > > > printf("Invalid character %C (\\u%04u)", m_current, m_current); > > > > But i'm not sure if Stirng::format supports %C > > > > I checked, and it doesn't... I'll add the \\u before the unicode, and figure out how I can implement the %C
Brainfart, just use %04d :d
Juan C. Montemayor
Comment 13
2011-06-27 12:48:54 PDT
Created
attachment 98769
[details]
Patch Should I run any other tests other than webkit-tests and javascriptcore-tests in both debug and release mode?
Oliver Hunt
Comment 14
2011-06-27 13:08:51 PDT
Comment on
attachment 98769
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=98769&action=review
Other than the above comments this looks good. Once this is done we can move to JSON error messages or getting line a character information :D
> Source/JavaScriptCore/parser/Lexer.cpp:531 > + m_lexErrorMessage = "\\u can only be followed by a unicode character number, or a closing quote";
s/number/sequence i think would be a better wording
> Source/JavaScriptCore/parser/Lexer.cpp:569 > + m_lexErrorMessage = "Problem reading character sequence";
I believe this should be "Unterminated string constant"
> Source/JavaScriptCore/parser/Lexer.cpp:1119 > + m_lexErrorMessage = String::format("Invalid character (\\u%04u)", m_current).impl();
I think this should be a function for creating the error message so that simple characters that are invalid are displayed in a user friendly way, eg: `, @, #, \n, \r, \v, \0 The rest can be shown as a unicode escape
Oliver Hunt
Comment 15
2011-06-27 13:09:50 PDT
Comment on
attachment 98769
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=98769&action=review
Other than the above comments this looks good. Once this is done we can move to JSON error messages or getting line a character information :D
>> Source/JavaScriptCore/parser/Lexer.cpp:531 >> + m_lexErrorMessage = "\\u can only be followed by a unicode character number, or a closing quote"; > > s/number/sequence i think would be a better wording
s/number/sequence i think would be a better wording
>> Source/JavaScriptCore/parser/Lexer.cpp:569 >> + m_lexErrorMessage = "Problem reading character sequence"; > > I believe this should be "Unterminated string constant"
I believe this should be "Unterminated string constant"
>> Source/JavaScriptCore/parser/Lexer.cpp:1119 >> + m_lexErrorMessage = String::format("Invalid character (\\u%04u)", m_current).impl(); > > I think this should be a function for creating the error message so that simple characters that are invalid are displayed in a user friendly way, eg: > `, @, #, \n, \r, \v, \0 > > The rest can be shown as a unicode escape
I think this should be a function for creating the error message so that simple characters that are invalid are displayed in a user friendly way, eg: `, @, #, \n, \r, \v, \0 The rest can be shown as a unicode escape
Oliver Hunt
Comment 16
2011-06-27 13:10:18 PDT
Sorry for the noise, review screen seemed to go nuts on me :(
Juan C. Montemayor
Comment 17
2011-06-29 19:02:26 PDT
Created
attachment 99208
[details]
Proposed patch fixed wording of some errors and added function for creating the error message
Juan C. Montemayor
Comment 18
2011-06-29 19:26:34 PDT
Created
attachment 99212
[details]
fixed patch Fixed tab and also added file that will hopefully not break cr-linux :)
Juan C. Montemayor
Comment 19
2011-06-29 19:33:51 PDT
Comment on
attachment 99212
[details]
fixed patch my bad, accidentally uploaded this patch to this bug. I'll try to find out why the win bot is failing on my other patch
Juan C. Montemayor
Comment 20
2011-06-30 11:03:26 PDT
Created
attachment 99335
[details]
proposed patch fixed the issue that made the win bot fail
Geoffrey Garen
Comment 21
2011-06-30 14:05:52 PDT
Comment on
attachment 99335
[details]
proposed patch View in context:
https://bugs.webkit.org/attachment.cgi?id=99335&action=review
Looking good -- the new error messages are a huge improvement. But I think this patch could be even better.
> Source/JavaScriptCore/parser/Lexer.cpp:237 > +ALWAYS_INLINE UString Lexer::getInvalidCharMessage()
I don't think you want to inline this -- it's the error case, so it should be out of line, for smaller code generation and less register pressure and instruction cache pollution in the non-error path.
> Source/JavaScriptCore/parser/Lexer.cpp:241 > + return UString("Invalid character '\\0'").impl();
Since your return type is UString, you can just return a C string ("Invalid..."), and the UString constructor will do the rest. This construct you've used will create two UStrings for every call, instead of one.
> Source/JavaScriptCore/parser/Lexer.cpp:243 > + return UString("Invalid character '\\n'").impl();
I think these messages would be clearer with a colon: "Invalid character: ...").
> Source/JavaScriptCore/parser/Lexer.cpp:554 > + m_lexErrorMessage = "\\u can only be followed by a unicode character sequence";
Unicode is a proper noun, so please capitalize it.
> Source/JavaScriptCore/parser/Lexer.cpp:562 > + m_lexErrorMessage = "Only valid numeric escape in strict mode is '\\0'";
To make this a complete sentence, say "The only valid..."
> Source/JavaScriptCore/parser/Lexer.cpp:1296 > + m_lexErrorMessage = UString("Lexer error");
No need for "UString" here -- the constructor will do that for you.
Juan C. Montemayor
Comment 22
2011-06-30 16:47:13 PDT
Created
attachment 99397
[details]
revised patch Fixed details mentioned by Geoff
Oliver Hunt
Comment 23
2011-06-30 18:09:18 PDT
Comment on
attachment 99397
[details]
revised patch View in context:
https://bugs.webkit.org/attachment.cgi?id=99397&action=review
> Source/JavaScriptCore/parser/Lexer.cpp:253 > + return "Invalid character: '\\#'"; > + case 64: > + return "Invalid character: '\\@'"; > + case 96: > + return "Invalid character: '\\`'";
These characters don't need to be escaped, so shouldn't be :-(
> Source/JavaScriptCore/parser/Lexer.cpp:1108 > + m_lexErrorMessage = "Only digits may follow a decimal point";
I think this would be better as "At least one digit must occur after a decimal point"
> Source/JavaScriptCore/parser/Lexer.cpp:1296 > + m_lexErrorMessage = "Lexer error";
Why have a string here at all in the clean state?
> Source/JavaScriptCore/parser/Lexer.h:78 > + m_lexErrorMessage = "Lexer error";
Default error message should be empty. By doing this assignment I believe you're adding at least two memory allocations and a memcpy to something we expect to be very fast.
> Source/JavaScriptCore/parser/Parser.cpp:54 > + UString lexErrorMessage = lexError ? lexer.getErrorMessage() : UString();
I would do lexErrorMessage = lexer.getErrorMessage(); ASSERT(lexErrorMessage.isNull() != lexError); // If there's been a lexer error there should be a message, if there hasn't been, there should not be a message.
Juan C. Montemayor
Comment 24
2011-07-01 10:53:45 PDT
Created
attachment 99487
[details]
patch updated patch based on Oliver's comments
WebKit Review Bot
Comment 25
2011-07-01 10:55:50 PDT
Attachment 99487
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/update-webkit', '--chromium']" exit_code: 2 Updating OpenSource Current branch master is up to date. Updating chromium port dependencies using gclient... ________ running '/usr/bin/python gyp_webkit' in '/mnt/git/webkit-style-queue/Source/WebKit/chromium' Updating webkit projects from gyp files... TError: /usr/bin/python gyp_webkit in /mnt/git/webkit-style-queue/Source/WebKit/chromium returned 1 raceback (most recent call last): File "gyp_webkit", line 117, in <module> sys.exit(gyp.main(args)) File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/__init__.py", line 448, in main options.circular_check) File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/__init__.py", line 87, in Load depth, generator_input_info, check, circular_check) File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/input.py", line 2210, in Load depth, check) File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/input.py", line 421, in LoadTargetBuildFile includes, depth, check) File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/input.py", line 421, in LoadTargetBuildFile includes, depth, check) File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/input.py", line 355, in LoadTargetBuildFile includes, True, check) File "/mnt/git/webkit-style-queue/Source/WebKit/chromium/tools/gyp/pylib/gyp/input.py", line 210, in LoadOneBuildFile raise Exception("%s not found (cwd: %s)" % (build_file_path, os.getcwd())) Exception: ../../WebKit/chromium/sql/sql.gyp not found (cwd: /mnt/git/webkit-style-queue/Source/WebKit/chromium) while loading dependencies of ../../WebKit/chromium/webkit/support/webkit_support.gyp while loading dependencies of WebKit.gyp while trying to load WebKit.gyp Died at Tools/Scripts/update-webkit-chromium line 69. No such file or directory at Tools/Scripts/update-webkit line 100. If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 26
2011-07-01 12:47:19 PDT
Comment on
attachment 99487
[details]
patch Clearing flags on attachment: 99487 Committed
r90265
: <
http://trac.webkit.org/changeset/90265
>
WebKit Review Bot
Comment 27
2011-07-01 12:47:28 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug