Bug 6257 - Throw errors on invalid expressions (KJS merge)
Summary: Throw errors on invalid expressions (KJS merge)
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 420+
Hardware: Other Linux
: P2 Normal
Assignee: Alexey Proskuryakov
URL:
Keywords:
: 8043 (view as bug list)
Depends on:
Blocks: 7383
  Show dependency treegraph
 
Reported: 2005-12-27 12:18 PST by Maks Orlovich
Modified: 2006-12-29 18:11 PST (History)
5 users (show)

See Also:


Attachments
patch (2.91 KB, patch)
2005-12-27 12:19 PST, Maks Orlovich
no flags Details | Formatted Diff | Diff
patch -- with proper var names for POSIX path (2.91 KB, patch)
2005-12-27 12:22 PST, Maks Orlovich
no flags Details | Formatted Diff | Diff
patch, with additional \u support (4.78 KB, patch)
2005-12-27 13:40 PST, Maks Orlovich
ggaren: review-
Details | Formatted Diff | Diff
run-javascriptcore-tests patch (593 bytes, patch)
2006-01-09 14:38 PST, Geoffrey Garen
no flags Details | Formatted Diff | Diff
proposed patch (11.93 KB, patch)
2006-12-12 04:33 PST, Alexey Proskuryakov
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Maks Orlovich 2005-12-27 12:18:06 PST
The attached makes the RegExp constructor throw JS error on syntax errors, and 
by doing that for the POSIX path also makes it work... It fixes one of the KJS 
RegExp tests, but makes one of the Mozilla tests throw an exception here where 
it used to "work" --- I do not think that's a regression. (It also shows the 
lack of \u support --- but that's a separate change). 
 
This is a merge of: 
r455153 | montanaro | 2005-08-30 14:00:05 -0400 (Tue, 30 Aug 2005) | 2 lines 
Backport from head, no need to call regcomp twice 
------------------------------------------------------------------------ 
r454902 | montanaro | 2005-08-30 04:33:46 -0400 (Tue, 30 Aug 2005) | 2 lines 
Committed fix for bug #110995 (non-pcre regular expressions do not work) 
------------------------------------------------------------------------ 
r393449 | porten | 2005-02-27 01:24:53 -0500 (Sun, 27 Feb 2005) | 2 lines 
throw exception when constructing an invalid regexp
Comment 1 Maks Orlovich 2005-12-27 12:19:08 PST
Created attachment 5305 [details]
patch

Also removes an another TODO which I think was no longer correct.
Comment 2 Maks Orlovich 2005-12-27 12:22:15 PST
Created attachment 5306 [details]
patch -- with proper var names for POSIX path

Oops, posix path may even build this time
Comment 3 Geoffrey Garen 2005-12-27 13:14:12 PST
We should figure out why the tests are newly-failing and fix them or the patch. 

A quick glance indicates that ecma_3/RegExp/regress-119909.js now reports as a failure because it 
throws an exception (whereas it used to silently ignore the regexp). I think that's a bug in how we interpret 
success/failure, because the test expects you to recognize the pattern as illegitimate. 

ecma_2/RegExp/properties-001.js used to pass fully, and now the exception prevents it from running. So 
we may need to surround a part of the test in a try/catch block. I'm also curious about why it used to pass 
-- probably because it barfed on an expression that wasn't expected to match in the first place.
Comment 4 Maks Orlovich 2005-12-27 13:40:34 PST
Created attachment 5308 [details]
patch, with additional \u support

To fix the properties-001 test, and one of our testcases, this also includes
the merge of the following (with one tweak -- it searches the string for \u
before applying the special path):
r354074 | porten | 2004-10-12 19:38:49 -0400 (Tue, 12 Oct 2004) | 4 lines

support \u escape sequences in regular expressions. They are non-standard -
at least they are not supported in Perl/PCRE/POSIX so we simply do our
own resolving of those.

---
This may need to be performance-tested, though.
Comment 5 Maciej Stachowiak 2005-12-29 00:51:55 PST
Comment on attachment 5308 [details]
patch, with additional \u support

Geoff, since you had the earlier comments, could you look at the new version?
Comment 6 Geoffrey Garen 2006-01-09 14:36:41 PST
Comment on attachment 5308 [details]
patch, with additional \u support

Sorry I took so long to review this.

1. ecma_3/RegExp/regress-119909.js. We can't check in code that causes test
regressions, so we need to figure out why this test still fails with your
latest patch. I suspect it's because (a) throwing an error prevents the rest of
the test from running or (b) testkjs interprets all thrown errors as failures.
Either way, the simplest solution is probably to surround the relevant part of
the test in a try/catch block.

2. Performance. I'm worried about the performance of scanning the expression
string an extra time. We've had performance problems in this area before. At a
minimum, I would suggest changing the code so that we only scan for \u if pcre
returns an error with the errorCode corresponding to "PCRE does not support \u
in RegExps." However, I think a better approach would be to have the lexer take
care of converting the unicode expression from the get-go.

3. To match flags() / _flags, _valid should be _isValid.

4. When pcre fails, doesn't it set _regex to NULL? If so, you can eliminate
_isValid, and have isValid() just return _regex.

r- for 1 & 2, but I'd love to see a patch that covers 3 & 4, too.

I hope this review doesn't sound too negative. You've been doing great work,
and we've had a lot of requests to fix this bug in particular, so it's great
that you're working on it.
Comment 7 Geoffrey Garen 2006-01-09 14:38:52 PST
Created attachment 5575 [details]
run-javascriptcore-tests patch

FYI, you'll have to apply this patch to get run-javascriptcore-tests to run in
a post-svn world. It looks like the Netscape tests were hard-coded for CVS.
Comment 8 Darin Adler 2006-01-09 14:59:57 PST
Regarding "the lexer take care of converting the unicode expression": We can do that if this is truly just a 
lexer-time feature. But if a regular expression that includes a "\" and a "u" has to work, even if it's 
composed out of separate strings, then we can't fix that by changing the lexer.

So we should do that if we can, but we need to test to see if that's right.
Comment 9 Maks Orlovich 2006-01-09 15:21:23 PST
I can only respond partly, since I am leaving for a conference soon'ish,and I 
better finish packing 
 
1. libpcre plain doesn't support as many nested parentheses as mozilla, so I 
do think the expectation for the test needs to be adjusted --- the point is 
that it shouldn't crash, anyway 
 
4. This code doesn't only work with PCRE, but also (somewhat) with POSIX   
regexps, if PCRE is not available. In that case, you can not 0-check.   
   
3. As Darin said, the lexer approach seems wrong for constructed strings.  
e.g. 'b a'.search(new RegExp("\\u" + "0020")) in firefox returns 1.  
  
2. I can certainly do a 2-pass approach, though it'll likely take some  
restructuring to get it to not overly complicate the code, considering 2.5 
RegExp runtimes supported. 
 
Comment 10 Geoffrey Garen 2006-01-11 09:50:12 PST
RegExp("\\u" + "0020") is a really interesting case -- something I hadn't considered. Both Firefox and 
MacIE honor it, and a quick reading of the spec suggests it's required, so, yes, that rules out the lexer 
solution. 

I'm not sure if we still care about supporting the POSIX library. Darin? Are there any platforms on which 
we can't use PCRE? Regardless, I think the 2-pass approach is worth it. It would just require factoring 
the Unicode escape converting code into its own function and calling that function as a fallback when 
the regular expression compiler returns an error.

With your patch applied, ecma_3/RegExp/regress-119909.js doesn't fail because it crashes. Rather, 
where we used to silently ignore the regular expression, we now throw an error. And our test suite 
interprets thrown errors as failures. Since we expect this test to throw an error, the simplest solution is 
to surround the test code in a try/catch block. The expectation for the test can remain the same.
Comment 11 rahul abrol 2006-03-29 07:01:48 PST
*** Bug 8043 has been marked as a duplicate of this bug. ***
Comment 12 Darin Adler 2006-03-29 10:40:19 PST
(In reply to comment #10)
> I'm not sure if we still care about supporting the POSIX library. Darin? Are
> there any platforms on which we can't use PCRE?

I don't think we do care about that. The KDE folks might, but I can't see why we'd even want to use a different regular expression library, since it's not going to work.
Comment 13 David Kilzer (:ddkilzer) 2006-12-10 07:44:08 PST
A fix for regexps with \u sequences was committed with Bug 7445 and later fixed with Bug 11501 using a two-pass approach as noted in Comment #9 and Comment #10 above.

Can this bug be closed?

Comment 14 Alexey Proskuryakov 2006-12-12 04:33:41 PST
Created attachment 11818 [details]
proposed patch

Tweaked the patch to also report the error message; removed the Unicode escapes part; a few style fixes.
Comment 15 Alexey Proskuryakov 2006-12-12 04:37:03 PST
Comment on attachment 5575 [details]
run-javascriptcore-tests patch

>FYI, you'll have to apply this patch to get run-javascriptcore-tests to run in
>a post-svn world. It looks like the Netscape tests were hard-coded for CVS.

This was fixed in a different way (by putting a correct Mixed.pm version number here).
Comment 16 Maks Orlovich 2006-12-12 08:17:27 PST
Thanks for doing this, Alexey
Comment 17 Geoffrey Garen 2006-12-12 11:59:33 PST
Comment on attachment 11818 [details]
proposed patch

r=me
Comment 18 Alexey Proskuryakov 2006-12-12 12:23:55 PST
Committed revision 18182.
Comment 19 David Kilzer (:ddkilzer) 2006-12-29 18:11:21 PST
(In reply to comment #9)
> 1. libpcre plain doesn't support as many nested parentheses as mozilla, so I 
> do think the expectation for the test needs to be adjusted --- the point is 
> that it shouldn't crash, anyway

For what it's worth, pcre-7.0 removes the limitation of more than 200 nested parenthesis (in lieu of other trade-offs).