Summary: | Throw errors on invalid expressions (KJS merge) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Maks Orlovich <maksim> | ||||||||||||
Component: | JavaScriptCore | Assignee: | Alexey Proskuryakov <ap> | ||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||
Severity: | Normal | CC: | ap, ddkilzer, ian, solushex, solutionarchitecture | ||||||||||||
Priority: | P2 | ||||||||||||||
Version: | 420+ | ||||||||||||||
Hardware: | Other | ||||||||||||||
OS: | Linux | ||||||||||||||
Bug Depends on: | |||||||||||||||
Bug Blocks: | 7383 | ||||||||||||||
Attachments: |
|
Description
Maks Orlovich
2005-12-27 12:18:06 PST
Created attachment 5305 [details]
patch
Also removes an another TODO which I think was no longer correct.
Created attachment 5306 [details]
patch -- with proper var names for POSIX path
Oops, posix path may even build this time
We should figure out why the tests are newly-failing and fix them or the patch. A quick glance indicates that ecma_3/RegExp/regress-119909.js now reports as a failure because it throws an exception (whereas it used to silently ignore the regexp). I think that's a bug in how we interpret success/failure, because the test expects you to recognize the pattern as illegitimate. ecma_2/RegExp/properties-001.js used to pass fully, and now the exception prevents it from running. So we may need to surround a part of the test in a try/catch block. I'm also curious about why it used to pass -- probably because it barfed on an expression that wasn't expected to match in the first place. Created attachment 5308 [details] patch, with additional \u support To fix the properties-001 test, and one of our testcases, this also includes the merge of the following (with one tweak -- it searches the string for \u before applying the special path): r354074 | porten | 2004-10-12 19:38:49 -0400 (Tue, 12 Oct 2004) | 4 lines support \u escape sequences in regular expressions. They are non-standard - at least they are not supported in Perl/PCRE/POSIX so we simply do our own resolving of those. --- This may need to be performance-tested, though. Comment on attachment 5308 [details]
patch, with additional \u support
Geoff, since you had the earlier comments, could you look at the new version?
Comment on attachment 5308 [details]
patch, with additional \u support
Sorry I took so long to review this.
1. ecma_3/RegExp/regress-119909.js. We can't check in code that causes test
regressions, so we need to figure out why this test still fails with your
latest patch. I suspect it's because (a) throwing an error prevents the rest of
the test from running or (b) testkjs interprets all thrown errors as failures.
Either way, the simplest solution is probably to surround the relevant part of
the test in a try/catch block.
2. Performance. I'm worried about the performance of scanning the expression
string an extra time. We've had performance problems in this area before. At a
minimum, I would suggest changing the code so that we only scan for \u if pcre
returns an error with the errorCode corresponding to "PCRE does not support \u
in RegExps." However, I think a better approach would be to have the lexer take
care of converting the unicode expression from the get-go.
3. To match flags() / _flags, _valid should be _isValid.
4. When pcre fails, doesn't it set _regex to NULL? If so, you can eliminate
_isValid, and have isValid() just return _regex.
r- for 1 & 2, but I'd love to see a patch that covers 3 & 4, too.
I hope this review doesn't sound too negative. You've been doing great work,
and we've had a lot of requests to fix this bug in particular, so it's great
that you're working on it.
Created attachment 5575 [details]
run-javascriptcore-tests patch
FYI, you'll have to apply this patch to get run-javascriptcore-tests to run in
a post-svn world. It looks like the Netscape tests were hard-coded for CVS.
Regarding "the lexer take care of converting the unicode expression": We can do that if this is truly just a lexer-time feature. But if a regular expression that includes a "\" and a "u" has to work, even if it's composed out of separate strings, then we can't fix that by changing the lexer. So we should do that if we can, but we need to test to see if that's right. I can only respond partly, since I am leaving for a conference soon'ish,and I better finish packing 1. libpcre plain doesn't support as many nested parentheses as mozilla, so I do think the expectation for the test needs to be adjusted --- the point is that it shouldn't crash, anyway 4. This code doesn't only work with PCRE, but also (somewhat) with POSIX regexps, if PCRE is not available. In that case, you can not 0-check. 3. As Darin said, the lexer approach seems wrong for constructed strings. e.g. 'b a'.search(new RegExp("\\u" + "0020")) in firefox returns 1. 2. I can certainly do a 2-pass approach, though it'll likely take some restructuring to get it to not overly complicate the code, considering 2.5 RegExp runtimes supported. RegExp("\\u" + "0020") is a really interesting case -- something I hadn't considered. Both Firefox and MacIE honor it, and a quick reading of the spec suggests it's required, so, yes, that rules out the lexer solution. I'm not sure if we still care about supporting the POSIX library. Darin? Are there any platforms on which we can't use PCRE? Regardless, I think the 2-pass approach is worth it. It would just require factoring the Unicode escape converting code into its own function and calling that function as a fallback when the regular expression compiler returns an error. With your patch applied, ecma_3/RegExp/regress-119909.js doesn't fail because it crashes. Rather, where we used to silently ignore the regular expression, we now throw an error. And our test suite interprets thrown errors as failures. Since we expect this test to throw an error, the simplest solution is to surround the test code in a try/catch block. The expectation for the test can remain the same. *** Bug 8043 has been marked as a duplicate of this bug. *** (In reply to comment #10) > I'm not sure if we still care about supporting the POSIX library. Darin? Are > there any platforms on which we can't use PCRE? I don't think we do care about that. The KDE folks might, but I can't see why we'd even want to use a different regular expression library, since it's not going to work. A fix for regexps with \u sequences was committed with Bug 7445 and later fixed with Bug 11501 using a two-pass approach as noted in Comment #9 and Comment #10 above. Can this bug be closed? Created attachment 11818 [details]
proposed patch
Tweaked the patch to also report the error message; removed the Unicode escapes part; a few style fixes.
Comment on attachment 5575 [details] run-javascriptcore-tests patch >FYI, you'll have to apply this patch to get run-javascriptcore-tests to run in >a post-svn world. It looks like the Netscape tests were hard-coded for CVS. This was fixed in a different way (by putting a correct Mixed.pm version number here). Thanks for doing this, Alexey Comment on attachment 11818 [details]
proposed patch
r=me
Committed revision 18182. (In reply to comment #9) > 1. libpcre plain doesn't support as many nested parentheses as mozilla, so I > do think the expectation for the test needs to be adjusted --- the point is > that it shouldn't crash, anyway For what it's worth, pcre-7.0 removes the limitation of more than 200 nested parenthesis (in lieu of other trade-offs). |