RESOLVED FIXED 16696
JSCRE fails fails to match Acid3 regexp
https://bugs.webkit.org/show_bug.cgi?id=16696
Summary JSCRE fails fails to match Acid3 regexp
Eric Seidel (no email)
Reported 2008-01-01 01:46:04 PST
JSCRE fails fails to match Acid3 regexp This is the assert we fail: assert(x == 4, "/(\\3)(\\1)(a)/ failed to match 'cat'"); I'm really not sure why that assert is valid in the first place. function () { // test 86: Regular Expressions var ok = true; try { eval("/TA[])]/.exec('TA]')"); // JS regexps aren't like Perl regexps, if their character // classes start with a ] that means they're empty. So this // is a syntax error; if we get here it's a bug. ok = false; } catch (e) { } assert(ok, "orphaned bracket not considered parse error in regular expression literal"); try { if (eval("/[]/.exec('')")) ok = false; } catch (e) { ok = false; } assert(ok, "/[]/ either failed to parse or matched something"); var x = /(\3)(\1)(a)/.exec('cat'); assert(x == 4, "/(\\3)(\\1)(a)/ failed to match 'cat'"); assert(x.length == 4, "/(\\3)(\\1)(a)/ failed to return four components"); assert(x[0] == "a", "/(\\3)(\\1)(a)/ failed to find 'a' in 'cat'"); assert(x[1] == "", "/(\\3)(\\1)(a)/ failed to find '' in 'cat' as first part"); assert(x[2] == "", "/(\\3)(\\1)(a)/ failed to find '' in 'cat' as second part"); assert(x[3] == "a", "/(\\3)(\\1)(a)/ failed to find 'a' in 'cat' as third part"); return 6; },
Attachments
patch (11.30 KB, patch)
2008-01-02 00:27 PST, Darin Adler
ggaren: review+
Darin Adler
Comment 1 2008-01-01 09:49:33 PST
There may be a problem with the test here; it looks to me like that expression requires three "a" characters in a row, so the test string would need to be "caaat". But also, our regular expression engine has a bug (and IE shares the bug, which is one reason I didn't fix it), where we don't recognize the \3 as a backreference because it's a forward reference. At the time we process it we don't know that there are 3 capturing brackets in the expression, so we treat it as an octal escape. To fix that we'll need to scan the regular expression for the number of brackets first before we do other computations on it. A simpler test case is just: /\1(a)/.exec('aa') Which will return null in Safari, but should return "a,a".
Eric Seidel (no email)
Comment 2 2008-01-01 11:11:37 PST
CCing Ian just to make sure he sees this.
Ian 'Hixie' Hickson
Comment 3 2008-01-01 14:29:46 PST
Forward references to capture brackets that haven't been seen yet, in ES4, are defined to always match the empty string. So this: /a(\2)(x)a/ ...matches only "axa", not "axxa", and on a positive match always returns ['axa', '', 'x', 'a']. FWIW, ES4 regexps are not Perl-compatible in this and many other edge cases.
Darin Adler
Comment 4 2008-01-01 14:43:55 PST
(In reply to comment #3) > Forward references to capture brackets that haven't been seen yet, in ES4, are defined to always match the empty string. So this: > /a(\2)(x)a/ > ...matches only "axa", not "axxa", and on a positive match always returns > ['axa', '', 'x', 'a']. OK, I see. I believe I made a fix to our code to handle that correctly a while back. Then I guess it's just the issue that our regular expression code doesn't recognize the forward reference as a capture at all, because of the quirk of treating them as octal escapes if they are higher than the number of capturing brackets. And the fact that the number of capturing brackets is based on the point in the parsing that we're at. I think a fix requires another pass over the regular expression to count the capturing brackets.
Eric Seidel (no email)
Comment 5 2008-01-01 15:07:14 PST
The specific supporting text according to Hixie: http://bclary.com/2004/11/07/#a-15.10.2.9 Step 8. sub-steps 2-3
Ian 'Hixie' Hickson
Comment 6 2008-01-01 15:16:27 PST
Er, I meant ES3. DecimalEscape in regexps (\0 to \9) are only ever backreferences in ES3. However, you still to raise a SyntaxError exception if the digit exceeds the number of captures in the regexp. But I suppose you don't need a second pass for that.
Darin Adler
Comment 7 2008-01-01 23:25:20 PST
(In reply to comment #6) > DecimalEscape in regexps (\0 to \9) are only ever backreferences in ES3. That's true. The ES3 B.1.2 appendix only lists octal escape sequences as an extension for string literals, not regular expressions. But both Gecko and IE support octal escape sequences as well as backreferences, and I believe they use the number of capturing parentheses in the expression as a cut-off. For example, try this expression: /\1/.exec(String.fromCharCode(1)).toString().charCodeAt(0) In Gecko this returns the value "1". > However, you still to raise a SyntaxError exception if the digit exceeds the > number of captures in the regexp. But I suppose you don't need a second pass > for that. I suspect the octal escape sequences are used in real-life web pages. So I do think I need an extra pass over the regular expression.
Darin Adler
Comment 8 2008-01-01 23:28:20 PST
Maciej, Geoff, this is a flaw in ECMAScript that we should get fixed for version 4. It doesn't specify the octal escape sequences for regular expressions, only string literals.
Darin Adler
Comment 9 2008-01-01 23:51:33 PST
(In reply to comment #0) > assert(x == 4, "/(\\3)(\\1)(a)/ failed to match 'cat'"); I don't understand how this could be right. The value of x is an array, not the number 4. > assert(x.length == 4, "/(\\3)(\\1)(a)/ failed to return four components"); > assert(x[0] == "a", "/(\\3)(\\1)(a)/ failed to find 'a' in 'cat'"); > assert(x[1] == "", "/(\\3)(\\1)(a)/ failed to find '' in 'cat' as first part"); > assert(x[2] == "", "/(\\3)(\\1)(a)/ failed to find '' in 'cat' as second part"); > assert(x[3] == "a", "/(\\3)(\\1)(a)/ failed to find 'a' in 'cat' as third part"); We seem to pass all of these. I suspect this is just a mistake in the test.
Darin Adler
Comment 10 2008-01-02 00:10:45 PST
We were passing all of this in Safari 3, but now it is broken. Patch on its way.
Darin Adler
Comment 11 2008-01-02 00:27:47 PST
Geoffrey Garen
Comment 12 2008-01-02 15:28:38 PST
Comment on attachment 18235 [details] patch r=me I wish the PCRE tests in fast/regexp were somehow separate from the rest.
Darin Adler
Comment 13 2008-01-02 22:59:04 PST
Committed revision 29110.
Note You need to log in before you can comment on or make changes to this bug.