Bug 21232 - RegExp.prototype.test bug with '@'
Summary: RegExp.prototype.test bug with '@'
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P1 Normal
Assignee: Gavin Barraclough
Keywords: Regression
Depends on:
Reported: 2008-09-29 23:44 PDT by kangax
Modified: 2008-10-02 09:46 PDT (History)
1 user (show)

See Also:

The patch (5.19 KB, patch)
2008-09-30 10:26 PDT, Gavin Barraclough
darin: review-
Details | Formatted Diff | Diff
The patch - now with added layouttesty goodness! (11.12 KB, patch)
2008-10-02 03:39 PDT, Gavin Barraclough
no flags Details | Formatted Diff | Diff
Layout tests have Changelogs too, y'know. D'oh. (11.91 KB, patch)
2008-10-02 03:47 PDT, Gavin Barraclough
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description kangax 2008-09-29 23:44:17 PDT
The following regexp returns `true` in latest nightly (r37056), but should return `false` (and does return `false` in Safari 3.1)

(/^[,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]*$/).test('@');

Thank you.
Comment 1 Alexey Proskuryakov 2008-09-30 07:38:12 PDT
Confirmed as a regression.
Comment 2 Gavin Barraclough 2008-09-30 09:22:28 PDT
There are two problems here, relating to the handling of hyphens in character ranges.

(1) In order to prevent [a\-c] from being treated as [a-c], we're forcing a flush after the hyphen has been added to the class.  Unfortunately this will give incorrect behaviour on the regexp [\--@], which should generate the range '-'..'@', but instead due to the flush will add '-' twice and '@' once to the class, all treated as separate characters.  The easy solution looks like it would be to flush prior to adding '\-' to the class, but this is also incorrect, since [+-\-] would now add separate characters rather than a range.  In order to prevent '\-' as being treated as a regular '-', but to still allow it itself to be a part of a range, we should flush the class under construction before adding the hyphen only if m_isPendingDash is not already set (since if m_isPendingDash is set this is a range, and the escaped hyphen is the last character in the range).

(2) The second bug is that we're not reseting m_isPendingDash when we flush.  So, given the regular expression [oh-\shoot, we'll add the 'o', be in a state that the 'h' and the hyphen are pending, when we see the character class next we'll flush, adding the 'h' and the hyphen, putting ourselves in an illegal state where m_isPendingDash is set but m_charBuffer isn't.  When the subsequent 'h' and 'o' are added to the set these will be treated as a range, since m_isPendingDash was not cleared.  This bug is not restricted to escaped hypens, and can be defended against will an assert when adding new chars that we're not in a state with m_isPendingDash set but not m_charBuffer.

In combination, the bug in this specific regexp is that the range '+'..'E' is getting added to the class.

patch coming.
Comment 3 Gavin Barraclough 2008-09-30 10:26:30 PDT
Created attachment 23942 [details]
The patch
Comment 4 Darin Adler 2008-09-30 10:49:58 PDT
Comment on attachment 23942 [details]
The patch

The fix looks fine, but we need to add test cases to LayoutTests for this. The patch should include the new tests and test results as well as the bug fix.

review- for now because of the lack of a test.
Comment 5 Gavin Barraclough 2008-10-02 03:39:00 PDT
Created attachment 24013 [details]
The patch - now with added layouttesty goodness!
Comment 6 Gavin Barraclough 2008-10-02 03:47:12 PDT
Created attachment 24015 [details]
Layout tests have Changelogs too, y'know.  D'oh.
Comment 7 Darin Adler 2008-10-02 09:02:44 PDT
Comment on attachment 24015 [details]
Layout tests have Changelogs too, y'know.  D'oh.

+//var regexp03a = /[1-3]+/.exec("[-]");
+//shouldBe('regexp03a.toString()', '"[-]"');
+//var regexp03a = /[\--o]+/.exec("-ok");
+//shouldBe('regexp03a.toString()', '"[-]"');

Why are those commented out?

Normally, we like to include tests even if they show failures.

Comment 8 Gavin Barraclough 2008-10-02 09:46:17 PDT
The commented out tests were incorrect, I've now deleted them (correct versions of these checks were these:

var regexp03a = /[\--0]+/.exec(",-.01");
shouldBe('regexp03a.toString()', '"-.0"');
var regexp03b = /[+-\-]+/.exec("*+,-.");
shouldBe('regexp03b.toString()', '"+,-"');

- and these are now commented.)
Comment 9 Gavin Barraclough 2008-10-02 09:46:49 PDT
Sending        JavaScriptCore/ChangeLog
Sending        JavaScriptCore/wrec/CharacterClassConstructor.cpp
Sending        JavaScriptCore/wrec/CharacterClassConstructor.h
Sending        JavaScriptCore/wrec/WREC.cpp
Sending        JavaScriptCore/wrec/WREC.h
Sending        LayoutTests/ChangeLog
Adding         LayoutTests/fast/js/regexp-ranges-and-escaped-hyphens-expected.txt
Adding         LayoutTests/fast/js/regexp-ranges-and-escaped-hyphens.html
Adding         LayoutTests/fast/js/resources/regexp-ranges-and-escaped-hyphens.js
Transmitting file data .........
Committed revision 37194.