Bug 6257

Summary:

Throw errors on invalid expressions (KJS merge)

Product:

WebKit

Reporter:

Maks Orlovich <maksim>

Component:

JavaScriptCore

Assignee:

Alexey Proskuryakov <ap>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

ap, ddkilzer, ian, solushex, solutionarchitecture

Priority:

Version:

420+

Hardware:

Other

OS:

Linux

Bug Depends on:

Bug Blocks:

7383

Attachments:

Description	Flags
patch	none
patch -- with proper var names for POSIX path	none
patch, with additional \u support	ggaren: review-
run-javascriptcore-tests patch	none
proposed patch	ggaren: review+

Maks Orlovich

Reported 2005-12-27 12:18:06 PST

The attached makes the RegExp constructor throw JS error on syntax errors, and by doing that for the POSIX path also makes it work... It fixes one of the KJS RegExp tests, but makes one of the Mozilla tests throw an exception here where it used to "work" --- I do not think that's a regression. (It also shows the lack of \u support --- but that's a separate change). This is a merge of: r455153 | montanaro | 2005-08-30 14:00:05 -0400 (Tue, 30 Aug 2005) | 2 lines Backport from head, no need to call regcomp twice ------------------------------------------------------------------------ r454902 | montanaro | 2005-08-30 04:33:46 -0400 (Tue, 30 Aug 2005) | 2 lines Committed fix for bug #110995 (non-pcre regular expressions do not work) ------------------------------------------------------------------------ r393449 | porten | 2005-02-27 01:24:53 -0500 (Sun, 27 Feb 2005) | 2 lines throw exception when constructing an invalid regexp

Attachments
patch (2.91 KB, patch) 2005-12-27 12:19 PST, Maks Orlovich	no flags	Details Formatted Diff Diff
patch -- with proper var names for POSIX path (2.91 KB, patch) 2005-12-27 12:22 PST, Maks Orlovich	no flags	Details Formatted Diff Diff
patch, with additional \u support (4.78 KB, patch) 2005-12-27 13:40 PST, Maks Orlovich	ggaren: review-	Details Formatted Diff Diff
run-javascriptcore-tests patch (593 bytes, patch) 2006-01-09 14:38 PST, Geoffrey Garen	no flags	Details Formatted Diff Diff
proposed patch (11.93 KB, patch) 2006-12-12 04:33 PST, Alexey Proskuryakov	ggaren: review+	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Maks Orlovich

Comment 1 2005-12-27 12:19:08 PST

Created attachment 5305 [details] patch Also removes an another TODO which I think was no longer correct.

Maks Orlovich

Comment 2 2005-12-27 12:22:15 PST

Created attachment 5306 [details] patch -- with proper var names for POSIX path Oops, posix path may even build this time

Geoffrey Garen

Comment 3 2005-12-27 13:14:12 PST

We should figure out why the tests are newly-failing and fix them or the patch. A quick glance indicates that ecma_3/RegExp/regress-119909.js now reports as a failure because it throws an exception (whereas it used to silently ignore the regexp). I think that's a bug in how we interpret success/failure, because the test expects you to recognize the pattern as illegitimate. ecma_2/RegExp/properties-001.js used to pass fully, and now the exception prevents it from running. So we may need to surround a part of the test in a try/catch block. I'm also curious about why it used to pass -- probably because it barfed on an expression that wasn't expected to match in the first place.

Maks Orlovich

Comment 4 2005-12-27 13:40:34 PST

Created attachment 5308 [details] patch, with additional \u support To fix the properties-001 test, and one of our testcases, this also includes the merge of the following (with one tweak -- it searches the string for \u before applying the special path): r354074 | porten | 2004-10-12 19:38:49 -0400 (Tue, 12 Oct 2004) | 4 lines support \u escape sequences in regular expressions. They are non-standard - at least they are not supported in Perl/PCRE/POSIX so we simply do our own resolving of those. --- This may need to be performance-tested, though.

Maciej Stachowiak

Comment 5 2005-12-29 00:51:55 PST

Comment on attachment 5308 [details] patch, with additional \u support Geoff, since you had the earlier comments, could you look at the new version?

Geoffrey Garen

Comment 6 2006-01-09 14:36:41 PST

Comment on attachment 5308 [details] patch, with additional \u support Sorry I took so long to review this. 1. ecma_3/RegExp/regress-119909.js. We can't check in code that causes test regressions, so we need to figure out why this test still fails with your latest patch. I suspect it's because (a) throwing an error prevents the rest of the test from running or (b) testkjs interprets all thrown errors as failures. Either way, the simplest solution is probably to surround the relevant part of the test in a try/catch block. 2. Performance. I'm worried about the performance of scanning the expression string an extra time. We've had performance problems in this area before. At a minimum, I would suggest changing the code so that we only scan for \u if pcre returns an error with the errorCode corresponding to "PCRE does not support \u in RegExps." However, I think a better approach would be to have the lexer take care of converting the unicode expression from the get-go. 3. To match flags() / _flags, _valid should be _isValid. 4. When pcre fails, doesn't it set _regex to NULL? If so, you can eliminate _isValid, and have isValid() just return _regex. r- for 1 & 2, but I'd love to see a patch that covers 3 & 4, too. I hope this review doesn't sound too negative. You've been doing great work, and we've had a lot of requests to fix this bug in particular, so it's great that you're working on it.

Geoffrey Garen

Comment 7 2006-01-09 14:38:52 PST

Created attachment 5575 [details] run-javascriptcore-tests patch FYI, you'll have to apply this patch to get run-javascriptcore-tests to run in a post-svn world. It looks like the Netscape tests were hard-coded for CVS.

Darin Adler

Comment 8 2006-01-09 14:59:57 PST

Regarding "the lexer take care of converting the unicode expression": We can do that if this is truly just a lexer-time feature. But if a regular expression that includes a "\" and a "u" has to work, even if it's composed out of separate strings, then we can't fix that by changing the lexer. So we should do that if we can, but we need to test to see if that's right.

Maks Orlovich

Comment 9 2006-01-09 15:21:23 PST

I can only respond partly, since I am leaving for a conference soon'ish,and I better finish packing 1. libpcre plain doesn't support as many nested parentheses as mozilla, so I do think the expectation for the test needs to be adjusted --- the point is that it shouldn't crash, anyway 4. This code doesn't only work with PCRE, but also (somewhat) with POSIX regexps, if PCRE is not available. In that case, you can not 0-check. 3. As Darin said, the lexer approach seems wrong for constructed strings. e.g. 'b a'.search(new RegExp("\\u" + "0020")) in firefox returns 1. 2. I can certainly do a 2-pass approach, though it'll likely take some restructuring to get it to not overly complicate the code, considering 2.5 RegExp runtimes supported.

Geoffrey Garen

Comment 10 2006-01-11 09:50:12 PST

RegExp("\\u" + "0020") is a really interesting case -- something I hadn't considered. Both Firefox and MacIE honor it, and a quick reading of the spec suggests it's required, so, yes, that rules out the lexer solution. I'm not sure if we still care about supporting the POSIX library. Darin? Are there any platforms on which we can't use PCRE? Regardless, I think the 2-pass approach is worth it. It would just require factoring the Unicode escape converting code into its own function and calling that function as a fallback when the regular expression compiler returns an error. With your patch applied, ecma_3/RegExp/regress-119909.js doesn't fail because it crashes. Rather, where we used to silently ignore the regular expression, we now throw an error. And our test suite interprets thrown errors as failures. Since we expect this test to throw an error, the simplest solution is to surround the test code in a try/catch block. The expectation for the test can remain the same.

rahul abrol

Comment 11 2006-03-29 07:01:48 PST

*** Bug 8043 has been marked as a duplicate of this bug. ***

Darin Adler

Comment 12 2006-03-29 10:40:19 PST

(In reply to comment #10) > I'm not sure if we still care about supporting the POSIX library. Darin? Are > there any platforms on which we can't use PCRE? I don't think we do care about that. The KDE folks might, but I can't see why we'd even want to use a different regular expression library, since it's not going to work.

David Kilzer (:ddkilzer)

Comment 13 2006-12-10 07:44:08 PST

A fix for regexps with \u sequences was committed with Bug 7445 and later fixed with Bug 11501 using a two-pass approach as noted in Comment #9 and Comment #10 above. Can this bug be closed?

Alexey Proskuryakov

Comment 14 2006-12-12 04:33:41 PST

Created attachment 11818 [details] proposed patch Tweaked the patch to also report the error message; removed the Unicode escapes part; a few style fixes.

Alexey Proskuryakov

Comment 15 2006-12-12 04:37:03 PST

Comment on attachment 5575 [details] run-javascriptcore-tests patch >FYI, you'll have to apply this patch to get run-javascriptcore-tests to run in >a post-svn world. It looks like the Netscape tests were hard-coded for CVS. This was fixed in a different way (by putting a correct Mixed.pm version number here).

Maks Orlovich

Comment 16 2006-12-12 08:17:27 PST

Thanks for doing this, Alexey

Geoffrey Garen

Comment 17 2006-12-12 11:59:33 PST

Comment on attachment 11818 [details] proposed patch r=me

Alexey Proskuryakov

Comment 18 2006-12-12 12:23:55 PST

Committed revision 18182.

David Kilzer (:ddkilzer)

Comment 19 2006-12-29 18:11:21 PST

(In reply to comment #9) > 1. libpcre plain doesn't support as many nested parentheses as mozilla, so I > do think the expectation for the test needs to be adjusted --- the point is > that it shouldn't crash, anyway For what it's worth, pcre-7.0 removes the limitation of more than 200 nested parenthesis (in lieu of other trade-offs).

Note You need to log in before you can comment on or make changes to this bug.