Bug 10370

Summary: RegExp fails to match non-ASCII characters against [\S\s]
Product: WebKit Reporter: Doug Wright <apple>
Component: JavaScriptCoreAssignee: Alexey Proskuryakov <ap>
Status: RESOLVED FIXED    
Severity: Major CC: ap, ddkilzer, hartman.wiki, mrowe, nilcolor
Priority: P2 Keywords: HasReduction
Version: 420+   
Hardware: All   
OS: OS X 10.4   
URL: http://www.dougweb.org/bugzilla/safari/regexpbug/
Attachments:
Description Flags
Reduced test case
none
a more complete test case
none
a more complete test case
none
proposed fix darin: review+

Description Doug Wright 2006-08-12 09:51:19 PDT
See testcase. The 2nd alert() should display a few lines of text, but instead displays null because the regexp scanner has barfed upon encountering ’.
Comment 1 Mark Rowe (bdash) 2006-08-12 19:19:19 PDT
Confirmed with WebKit ToT and 418.8.  The character in question is Unicode "RIGHT SINGLE QUOTATION MARK".  Reduction forthcoming.
Comment 2 Mark Rowe (bdash) 2006-08-12 19:20:11 PDT
Created attachment 10008 [details]
Reduced test case
Comment 3 Alexey Proskuryakov 2007-09-19 03:53:07 PDT
*** Bug 15224 has been marked as a duplicate of this bug. ***
Comment 4 Alexey Proskuryakov 2007-09-19 03:56:13 PDT
As seen in bug 15224, this affects all non-ASCII characters, and causes problems in prototype.js. Looks like a very important bug to me.
Comment 5 Alexey Proskuryakov 2007-09-20 05:53:11 PDT
Created attachment 16333 [details]
a more complete test case

Tests other regex special characters, too. Passes in Firefox, and mostly passes in IE7, which apparently doesn't treat Unicode whitespace characters as such.
Comment 6 Alexey Proskuryakov 2007-09-22 02:17:25 PDT
This issue is also present in original PCRE 6.1 and 7.4. From comments in code, I'm not sure what the intended behavior for Perl is, but the the fact that \S and [\S] work differently surely looks like an bug.
Comment 7 Alexey Proskuryakov 2007-09-22 13:07:17 PDT
Created attachment 16349 [details]
a more complete test case

Added a test for a closely related issue from <http://bugs.exim.org/show_bug.cgi?id=580>. That bug was recently fixed, see

svn diff -r218:219 svn://tahini.csx.cam.ac.uk/pcre

I'm going to file the problem with [\S] to PCRE bugzilla soon.
Comment 8 Alexey Proskuryakov 2007-09-22 13:19:12 PDT
(In reply to comment #7)
> svn diff -r218:219 svn://tahini.csx.cam.ac.uk/pcre

I've just found that there's a ViewVC for PCRE: http://vcs.pcre.org/viewvc?view=rev&revision=219

> I'm going to file the problem with [\S] to PCRE bugzilla soon.

http://bugs.exim.org/show_bug.cgi?id=603
Comment 9 Alexey Proskuryakov 2007-09-29 02:41:59 PDT
Created attachment 16446 [details]
proposed fix

This is based on an approach suggested by Philip Hazel, and on his fix for \S{2} vs. \S\S bug.

I think this fix is important enough to go to trunk.
Comment 10 Darin Adler 2007-10-01 16:54:02 PDT
Comment on attachment 16446 [details]
proposed fix

r=me
Comment 11 Alexey Proskuryakov 2007-10-02 21:42:27 PDT
Committed revision 25958 (feature branch).
Comment 12 Alexey Blinov 2007-10-08 00:36:38 PDT
Hi. Feature branch - is it nightly build of WebKit (http://nightly.webkit.org/)?
Or I have to compile it myself?
Comment 13 Mark Rowe (bdash) 2007-10-08 00:39:30 PDT
Nightly builds of the feature branch are available at http://nightly.webkit.org/builds/overview/feature-branch.
Comment 14 Alexey Proskuryakov 2007-11-04 03:20:57 PST
*** Bug 14877 has been marked as a duplicate of this bug. ***